After uploading files to an assistant, you can chat with the assistant.
This page shows you how to chat with an assistant using the OpenAI-compatible chat interface. This interface is based on the OpenAI Chat Completion API, a commonly used and adopted API. It is useful if you need inline citations or OpenAI-compatible responses, but has limited functionality compared to the standard chat interface.
The standard chat interface is the recommended way to chat with an assistant, as it offers more functionality and control over the assistant’s responses and references.
The following example sends a message and requests a response in the default format:
The content parameter in the request cannot be empty.
Copy
# To use the Python SDK, install the plugin:# pip install --upgrade pinecone pinecone-plugin-assistantfrom pinecone import Pineconefrom pinecone_plugins.assistant.models.chat import Messagepc = Pinecone(api_key="YOUR_API_KEY")# Get your assistant.assistant = pc.assistant.Assistant( assistant_name="example-assistant", )# Chat with the assistant.chat_context = [Message(role="user", content='What is the maximum height of a red pine?')]response = assistant.chat_completions(messages=chat_context)print(response)
The example above returns a result like the following:
Copy
{"chat_completion": { "id":"chatcmpl-9OtJCcR0SJQdgbCDc9JfRZy8g7VJR", "choices":[ { "finish_reason":"stop", "index":0, "message":{ "role":"assistant", "content":"The maximum height of a red pine (Pinus resinosa) is up to 25 meters." } } ], "model":"my_assistant" }}
The following example sends a messages and requests a streaming response:
The content parameter in the request cannot be empty.
Copy
# To use the Python SDK, install the plugin:# pip install --upgrade pinecone pinecone-plugin-assistantfrom pinecone import Pineconefrom pinecone_plugins.assistant.models.chat import Messagepc = Pinecone(api_key="YOUR_API_KEY")# Get your assistant.assistant = pc.assistant.Assistant( assistant_name="example-assistant")# Streaming chat with the Assistant.chat_context = [Message(role="user", content="What is the maximum height of a red pine?")]response = assistant.chat_completions(messages=[chat_context], stream=True)for data in response: if data: print(data)
The example above returns a result like the following:
In the assistant’s response, the message string is contained in the following JSON object:
choices.[0].message.content for the default chat response
choices[0].delta.content for the streaming chat response
You can extract the message content and print it to the console:
Copy
print(str(response.choices[0].message.content))
This creates output like the following:
Copy
A red pine, scientifically known as *Pinus resinosa*, is a medium-sized tree that can grow up to 25 meters high and 75 centimeters in diameter. [1, pp. 1]
Copy
print(str(response.choices[0].message.content))
This creates output like the following:
Copy
A red pine, scientifically known as *Pinus resinosa*, is a medium-sized tree that can grow up to 25 meters high and 75 centimeters in diameter. [1, pp. 1]
Copy
for data in response: if data: print(str(data.choices[0].delta.content))
This creates output like the following:
Streaming response
Copy
The maximum height of a red pine (Pinus resinosa) is up to twenty-five meters [1, pp. 1].
To choose a non-default model for your assistant, set the model parameter in the request:
Copy
# To use the Python SDK, install the plugin:# pip install --upgrade pinecone pinecone-plugin-assistantfrom pinecone import Pineconefrom pinecone_plugins.assistant.models.chat import Messagepc = Pinecone(api_key="YOUR_API_KEY")# Get your assistant.assistant = pc.assistant.Assistant( assistant_name="example-assistant", )# Chat with the assistant.chat_context = [Message(role="user", content="What is the maximum height of a red pine?")]response = assistant.chat_completions( messages=chat_context, model="gpt-4.1")
# To use the Python SDK, install the plugin:# pip install --upgrade pinecone pinecone-plugin-assistantfrom pinecone import Pineconefrom pinecone_plugins.assistant.models.chat import Messagepc = Pinecone(api_key="YOUR_API_KEY")# Get your assistant.assistant = pc.assistant.Assistant( assistant_name="example-assistant", )# Chat with the assistant.chat_context = [Message(role="user", content="What is the maximum height of a red pine?")]response = assistant.chat_completions(messages=chat_context, stream=True, filter={"resource": "encyclopedia"})
This is available in API versions 2025-04 and later.
Temperature is a parameter that controls the randomness of a model’s predictions during text generation. Lower temperatures (~0.0) yield more consistent, predictable answers, while higher temperatures increase the model’s explanatory power and is generally better for creative tasks.
To control the sampling temperature for a model, set the temperarture parameter in the request. If a model does not support a temperature parameter, the parameter is ignored.
Copy
# To use the Python SDK, install the plugin:# pip install --upgrade pinecone pinecone-plugin-assistantfrom pinecone import Pineconefrom pinecone_plugins.assistant.models.chat import Messagepc = Pinecone(api_key="YOUR_API_KEY")assistant = pc.assistant.Assistant(assistant_name="example-assistant")msg = Message(role="user", content="Who is the CFO of Netflix?")response = assistant.chat_completions( messages=[msg], temperature=0.8)print(response)