This page shows you how to chat with a Pinecone Assistant.

To learn about the concepts related to Pinecone Assistant, see Understanding Pinecone Assistant.

This feature is in public preview.

Chat using the Assistant API

To chat with an assistant, use the chat_completion_assistant endpoint. This operation returns either a JSON object or a text stream.

You can chat with an assistant using the Pinecone console. Select the assistant to chat with, and use the Assistant playground.

Request a JSON response

The following example requests a JSON response to the message, “What is the maximum height of a red pine?“:

The content parameter in the request cannot be empty.

The example above returns a result like the following:

{"chat_completion":
  {
    "id":"chatcmpl-9OtJCcR0SJQdgbCDc9JfRZy8g7VJR",
    "choices":[
      {
        "finish_reason":"stop",
        "index":0,
        "message":{
          "role":"assistant",
          "content":"The maximum height of a red pine (Pinus resinosa) is up to 25 meters."
        }
      }
    ],
    "model":"my_assistant"
  }
}

The JSON response object from the Assistant API contains the assistant’s chat response along with other information. The message string is contained in the following JSON object:

choices.[0].message.content

To extract the response message from the assistant’s JSON response and print it to the console, add the following to your request:

This creates output like the following:

A red pine, scientifically known as *Pinus resinosa*, is a medium-sized tree that can grow up to 25 meters high and 75 centimeters in diameter. [1, pp. 1]

Request a streaming response

The following example requests a text streaming response to the message, “What is the maximum height of a red pine?“:

The content parameter in the request cannot be empty.

The example above returns a result like the following:

{
  'id': '000000000000000009de65aa87adbcf0', 
  'choices': [
      {
      'index': 0, 
      'delta': 
        {
        'role': 'assistant', 
        'content': 'The'
        }, 
      'finish_reason': None
      }
    ], 
  'model': 'gpt-4o-2024-05-13'
}

...

{
  'id': '00000000000000007a927260910f5839',
  'choices': [
      {
      'index': 0,
      'delta':
        {
          'role': '', 
          'content': 'The'
        }, 
      'finish_reason': None
      }
    ], 
  'model': 'gpt-4o-2024-05-13'
}

...

{
  'id': '00000000000000007a927260910f5839', 
  'choices': [
    {
      'index': 0, 
      'delta': 
        {
        'role': None, 
        'content': None
        }, 
      'finish_reason': 'stop'
      }
    ], 
  'model': 'gpt-4o-2024-05-13'
}

There are three types of chunks in a chat completion response:

  • Starting chunk: Includes "role":"assistant", which indicates that the assistant is responding to the user’s message.
  • Content chunk: Includes a value in the content field (e.g., "content":"The"), which is part of the assistant’s streamed response to the user’s message.
  • Ending chunk: Includes "finish_reason":"stop", which indicates that the assistant has finished responding to the user’s message.

The streaming response object from the Assistant API contains the assistant’s chat response along with other information. The message string is contained in the following JSON object:

choices.[0].message.content

To extract the response message from the assistant’s streaming response and print it to the console, add the following to your request:

This creates output like the following:

The
 maximum
 height
 of
 a
 red
 pine
 (
Pin
us
 resin
osa
)
 is
 up
 to
 twenty
-five
 meters

 [1, pp. 1]
.

Provide conversation history in a chat request

Models lack memory of previous requests, so any relevant messages from earlier in the conversation must be present in the messages object.

In the following example, the messages object includes prior messages that are necessary for interpreting the newest message.

The above example request returns a response like the following:

{"chat_completion":
  {
    "id":"chatcmpl-9OtJCcR0SJQdgbCDc9JfRZy8g7VJR",
    "choices":[
      {
        "finish_reason":"stop",
        "index":0,
        "message":{
          "role":"assistant",
          "content":"The maximum diameter of a red pine (Pinus resinosa) is 75 centimeters [1, pp. 1]"
        }
      }
    ],
    "model":"my_assistant"
  }
}

Filter chat with metadata

You can filter which documents to use for chat completions. The following example filters the responses to use only documents that include the metadata "resource": "encyclopedia".

For more information about filtering with metadata, see Filter with metadata.

Choose a model for your assistant

Pinecone Assistant uses the gpt-4o model by default. Alternatively, you can use the claude-3-5-sonnet model. Select the LLM to use by setting the model parameter in the request: