Chat with an assistant
After uploading files to an assistant, there are two interfaces you can use to chat with the assistant:
-
Standard chat interface: This is the recommended way to chat with an assistant, as it offers more functionality and control over the assistant’s responses and references than the OpenAI-compatible chat interface. For more information, see Chat with an assistant.
-
OpenAI-compatible chat interface: This interface is based on the OpenAI Chat Completion API, a commonly used and adopted API. It is useful if you need inline citations or OpenAI-compatible responses, but has limited functionality compared to the standard chat interface. For more information, see Chat through an OpenAI-compatible interface.
You can chat with an assistant using the Pinecone console. Select the assistant to chat with, and use the Assistant playground.
Chat with an assistant
You can chat with a Pinecone assistant through the standard chat interface. It returns either a JSON object or a text stream.
This is the recommended way to chat with an assistant, as it offers more functionality and control over the assistant’s responses and references. However, if you need your assistant to be OpenAI-compatible or need inline citations, use the OpenAI-compatible chat interface.
Default response
The following example sends a message and requests a response in the default format (JSON object):
The content
parameter in the request cannot be empty.
The example above returns a result like the following:
Streaming response
The following example sends a message and requests a streaming response:
The content
parameter in the request cannot be empty.
The example above returns a result like the following:
There are four types of chunks in a streaming chat response:
- Starting chunk: Includes
"role":"assistant"
, which indicates that the assistant is responding to the user’s message. - Content chunk: Includes a value in the
content
field (e.g.,"content":"The"
), which is part of the assistant’s streamed response to the user’s message. - Citation chunk: Includes a citation to the document that the assistant used to generate the response.
- Ending chunk: Includes
"finish_reason":"stop"
, which indicates that the assistant has finished responding to the user’s message.
JSON response
The following example uses the json_response
parameter to instruct the assistant to return a JSON response:
JSON response cannot be used with the stream
parameter.
Chat through an OpenAI-compatible interface
The OpenAI-compatible chat interface is based on the OpenAI Chat Completion API, a commonly used and adopted API. It is useful if you need inline citations or OpenAI-compatible responses, but has limited functionality compared to the standard chat interface. It returns either a JSON object or a text stream.
If you do not need your assistant to be OpenAI-compatible or need inline citations, use the standard chat interface.
Default response
The following example sends a message and requests a response in the default format (JSON object):
The content
parameter in the request cannot be empty.
The example above returns a result like the following:
Streaming response
The following example sends a messages and requests a streaming response:
The content
parameter in the request cannot be empty.
The example above returns a result like the following:
There are three types of chunks in a chat completion response:
- Starting chunk: Includes
"role":"assistant"
, which indicates that the assistant is responding to the user’s message. - Content chunk: Includes a value in the
content
field (e.g.,"content":"The"
), which is part of the assistant’s streamed response to the user’s message. - Ending chunk: Includes
"finish_reason":"stop"
, which indicates that the assistant has finished responding to the user’s message.
Provide conversation history in a chat request
Models lack memory of previous requests, so any relevant messages from earlier in the conversation must be present in the messages
object.
In the following example, the messages
object includes prior messages that are necessary for interpreting the newest message.
The above example request returns a response like the following:
Filter chat with metadata
You can filter which documents to use for chat completions. The following example filters the responses to use only documents that include the metadata "resource": "encyclopedia"
.
Choose a model for your assistant
Pinecone Assistant uses the gpt-4o
model by default. Alternatively, you can use the claude-3-5-sonnet
model. Select the LLM to use by setting the model
parameter in the request:
Extract the response content
Both the standard and OpenAI-compatible chat interfaces return a JSON response object containing the assistant’s chat response along with other information. The message string is contained in the following JSON object:
choices.[0].message.content
for a JSON chat responsechoices[0].delta.content
for a streaming chat response
You can extract the message content and print it to the console:
This creates output like the following:
Was this page helpful?