Chat with an assistant and get back citations in structured form.
This is the recommended way to chat with an assistant, as it offers more functionality and control over the assistant’s responses and references than the OpenAI-compatible chat interface.
For guidance and examples, see Chat with an assistant.
Pinecone API Key
Required date-based version header
The name of the assistant to be described.
The desired configuration to chat with an assistant.
Represents a request to chat with an assistant.
The list of messages sent to the assistant, used for context retrieval and generating response with the LLM.
If false, the assistant returns a single JSON response. If true, the assistant returns a stream of responses.
The large language model used to generate responses.
Controls the randomness of the model's output: lower values make responses more deterministic, while higher values increase creativity and variability. If the model does not support a temperature parameter, the parameter will be ignored.
Optional metadata-based filter to restrict which documents are retrieved for the assistant's response context.
{ "genre": { "$ne": "documentary" } }If true, instructs the assistant to return a JSON-formatted response. Cannot be used together with streaming mode.
If true, instructs the assistant to include highlights from the referenced documents that support its response.
Controls the context snippets sent to the LLM.
Search request successful.
Describes the response format of a chat request.
A unique identifier for this chat response.
Indicates why the chat response generation stopped. This signals the end of the response.
stop: The model finished generating the response.
length: Generation was cut off because the maximum number of tokens allowed was reached.
content_filter: Generation stopped because content was blocked by content filtering rules.
(for example, content that contains hate speech or violent material).
tool_calls: Generation stopped because a tool call was triggered.
Describes the format of a message in a chat.
The name or identifier of the model used to generate this chat response.
Citations supporting the information in the response.
Describes the token usage associated with interactions with an assistant.
The number of context snippets provided to the model to generate the response. This indicates how much retrieved information was available for the generation, allowing for logic to be applied if no context was found (count is 0).
Content filter results provided by the LLM, describing safety-related classifications applied to the content. The structure may vary depending on the model and the content being filtered. The spec field identifies the provider, and determines the structure of results.