After uploading files to an assistant, you can chat with the assistant.
This page shows you how to chat with an assistant using the OpenAI-compatible chat interface. This interface is based on the OpenAI Chat Completion API, a commonly used and adopted API. It is useful if you need inline citations or OpenAI-compatible responses, but has limited functionality compared to the standard chat interface.
The standard chat interface is the recommended way to chat with an assistant, as it offers more functionality and control over the assistant’s responses and references.
The OpenAI-compatible chat interface can return responses in two different formats:
The following example sends a message and requests a response in the default format:
The content
parameter in the request cannot be empty.
The example above returns a result like the following:
The following example sends a messages and requests a streaming response:
The content
parameter in the request cannot be empty.
The example above returns a result like the following:
There are three types of messages in a chat completion response:
"role":"assistant"
, which indicates that the assistant is responding to the user’s message.content
field (e.g., "content":"The"
), which is part of the assistant’s streamed response to the user’s message."finish_reason":"stop"
, which indicates that the assistant has finished responding to the user’s message.In the assistant’s response, the message string is contained in the following JSON object:
choices.[0].message.content
for the default chat responsechoices[0].delta.content
for the streaming chat responseYou can extract the message content and print it to the console:
This creates output like the following:
This creates output like the following:
This creates output like the following:
Pinecone Assistant supports the following models:
gpt-4o
(default)gpt-4.1
o4-mini
claude-3-5-sonnet
claude-3-7-sonnet
gemini-2.5-pro
To choose a non-default model for your assistant, set the model
parameter in the request:
You can filter which documents to use for chat completions. The following example filters the responses to use only documents that include the metadata "resource": "encyclopedia"
.
This is available in API versions 2025-04
and later.
Temperature is a parameter that controls the randomness of a model’s predictions during text generation. Lower temperatures (~0.0) yield more consistent, predictable answers, while higher temperatures increase the model’s explanatory power and is generally better for creative tasks.
To control the sampling temperature for a model, set the temperarture
parameter in the request. If a model does not support a temperature parameter, the parameter is ignored.
After uploading files to an assistant, you can chat with the assistant.
This page shows you how to chat with an assistant using the OpenAI-compatible chat interface. This interface is based on the OpenAI Chat Completion API, a commonly used and adopted API. It is useful if you need inline citations or OpenAI-compatible responses, but has limited functionality compared to the standard chat interface.
The standard chat interface is the recommended way to chat with an assistant, as it offers more functionality and control over the assistant’s responses and references.
The OpenAI-compatible chat interface can return responses in two different formats:
The following example sends a message and requests a response in the default format:
The content
parameter in the request cannot be empty.
The example above returns a result like the following:
The following example sends a messages and requests a streaming response:
The content
parameter in the request cannot be empty.
The example above returns a result like the following:
There are three types of messages in a chat completion response:
"role":"assistant"
, which indicates that the assistant is responding to the user’s message.content
field (e.g., "content":"The"
), which is part of the assistant’s streamed response to the user’s message."finish_reason":"stop"
, which indicates that the assistant has finished responding to the user’s message.In the assistant’s response, the message string is contained in the following JSON object:
choices.[0].message.content
for the default chat responsechoices[0].delta.content
for the streaming chat responseYou can extract the message content and print it to the console:
This creates output like the following:
This creates output like the following:
This creates output like the following:
Pinecone Assistant supports the following models:
gpt-4o
(default)gpt-4.1
o4-mini
claude-3-5-sonnet
claude-3-7-sonnet
gemini-2.5-pro
To choose a non-default model for your assistant, set the model
parameter in the request:
You can filter which documents to use for chat completions. The following example filters the responses to use only documents that include the metadata "resource": "encyclopedia"
.
This is available in API versions 2025-04
and later.
Temperature is a parameter that controls the randomness of a model’s predictions during text generation. Lower temperatures (~0.0) yield more consistent, predictable answers, while higher temperatures increase the model’s explanatory power and is generally better for creative tasks.
To control the sampling temperature for a model, set the temperarture
parameter in the request. If a model does not support a temperature parameter, the parameter is ignored.