Chat through the standard interface
After uploading files to an assistant, you can chat with the assistant.
You can chat with an assistant using the Pinecone console. Select the assistant to chat with, and use the Assistant playground.
Chat through the standard interface
The standard chat interface can return responses in three different formats:
- Default response: The assistant returns a structured response and separate citation information.
- Streaming response: The assistant returns the response as a text stream.
- JSON response: The assistant returns the response as JSON key-value pairs.
This is the recommended way to chat with an assistant, as it offers more functionality and control over the assistant’s responses and references. However, if you need your assistant to be OpenAI-compatible or need inline citations, use the OpenAI-compatible chat interface.
Default response
The following example sends a message and requests a default response:
The content
parameter in the request cannot be empty.
The example above returns a result like the following:
Streaming response
The following example sends a message and requests a streaming response:
The content
parameter in the request cannot be empty.
The example above returns a result like the following:
There are four types of messages in a streaming chat response:
- Message start: Includes
"role":"assistant"
, which indicates that the assistant is responding to the user’s message. - Content: Includes a value in the
content
field (e.g.,"content":"The"
), which is part of the assistant’s streamed response to the user’s message. - Citation: Includes a citation to the document that the assistant used to generate the response.
- Message end: Includes
"finish_reason":"stop"
, which indicates that the assistant has finished responding to the user’s message.
JSON response
The following example uses the json_response
parameter to instruct the assistant to return the response as JSON key-value pairs. This is useful if you need to parse the response programmatically.
JSON response cannot be used with the stream
parameter.
The example above returns a result like the following:
Provide conversation history in a chat request
Models lack memory of previous requests, so any relevant messages from earlier in the conversation must be present in the messages
object.
In the following example, the messages
object includes prior messages that are necessary for interpreting the newest message.
The example returns a response like the following:
Filter chat with metadata
You can filter which documents to use for chat completions. The following example filters the responses to use only documents that include the metadata "resource": "encyclopedia"
.
Choose a model for your assistant
Pinecone Assistant uses the gpt-4o
model by default. Alternatively, you can use the claude-3-5-sonnet
model. Select the LLM to use by setting the model
parameter in the request:
Include citation highlights in the response
Citation highlights are available in the Pinecone console or API versions 2025-04
and later.
When using the standard chat interface, every response includes a citation
object. The object includes a reference to the document that the assistant used to generate the response. Additionally, you can include highlights, which are the specific parts of the document that the assistant used to generate the response, by setting the include_highlights
parameter to true
in the request:
The example returns response like the following:
Enabling highlights will increase token usage.
Extract the response content
The assistant’s response is returned in a JSON response object along with other information. The message string is contained in the following JSON object:
choices.[0].message.content
for the default chat responsechoices[0].delta.content
for the streaming chat response
You can extract the message content and print it to the console:
This creates output like the following:
Was this page helpful?