After uploading files to an assistant, you can chat with the assistant.

This page shows you how to chat with an assistant using the OpenAI-compatible chat interface. This interface is based on the OpenAI Chat Completion API, a commonly used and adopted API. It is useful if you need inline citations or OpenAI-compatible responses, but has limited functionality compared to the standard chat interface.

The standard chat interface is the recommended way to chat with an assistant, as it offers more functionality and control over the assistant’s responses and references.

Chat with an assistant

The OpenAI-compatible chat interface can return responses in two different formats:

  • Default response: The assistant returns a response in a single string field, which includes citation information.
  • Streaming response: The assistant returns the response as a text stream.

Default response

The following example sends a message and requests a response in the default format:

The content parameter in the request cannot be empty.

# To use the Python SDK, install the plugin:
# pip install --upgrade pinecone pinecone-plugin-assistant

from pinecone import Pinecone
from pinecone_plugins.assistant.models.chat import Message

pc = Pinecone(api_key="YOUR_API_KEY")

# Get your assistant.
assistant = pc.assistant.Assistant(
    assistant_name="example-assistant", 
)

# Chat with the assistant.
chat_context = [Message(role="user", content='What is the maximum height of a red pine?')]
response = assistant.chat_completions(messages=chat_context)

The example above returns a result like the following:

{"chat_completion":
  {
    "id":"chatcmpl-9OtJCcR0SJQdgbCDc9JfRZy8g7VJR",
    "choices":[
      {
        "finish_reason":"stop",
        "index":0,
        "message":{
          "role":"assistant",
          "content":"The maximum height of a red pine (Pinus resinosa) is up to 25 meters."
        }
      }
    ],
    "model":"my_assistant"
  }
}

Streaming response

The following example sends a messages and requests a streaming response:

The content parameter in the request cannot be empty.

# To use the Python SDK, install the plugin:
# pip install --upgrade pinecone pinecone-plugin-assistant

from pinecone import Pinecone
from pinecone_plugins.assistant.models.chat import Message

pc = Pinecone(api_key="YOUR_API_KEY")

# Get your assistant.
assistant = pc.assistant.Assistant(
    assistant_name="example-assistant" 
)

# Streaming chat with the Assistant.
chat_context = [Message(role="user", content="What is the maximum height of a red pine?")]
response = assistant.chat_completions(messages=[chat_context], stream=True, model="gpt-4o")

for data in response:
    if data:
        print(data)

The example above returns a result like the following:

{
  'id': '000000000000000009de65aa87adbcf0', 
  'choices': [
      {
      'index': 0, 
      'delta': 
        {
        'role': 'assistant', 
        'content': 'The'
        }, 
      'finish_reason': None
      }
    ], 
  'model': 'gpt-4o-2024-05-13'
}

...

{
  'id': '00000000000000007a927260910f5839',
  'choices': [
      {
      'index': 0,
      'delta':
        {
          'role': '', 
          'content': 'The'
        }, 
      'finish_reason': None
      }
    ], 
  'model': 'gpt-4o-2024-05-13'
}

...

{
  'id': '00000000000000007a927260910f5839', 
  'choices': [
    {
      'index': 0, 
      'delta': 
        {
        'role': None, 
        'content': None
        }, 
      'finish_reason': 'stop'
      }
    ], 
  'model': 'gpt-4o-2024-05-13'
}

There are three types of messages in a chat completion response:

  • Message start: Includes "role":"assistant", which indicates that the assistant is responding to the user’s message.
  • Content: Includes a value in the content field (e.g., "content":"The"), which is part of the assistant’s streamed response to the user’s message.
  • Message end: Includes "finish_reason":"stop", which indicates that the assistant has finished responding to the user’s message.

Extract the response content

The assistant’s response is returned in a JSON response object along with other information. The message string is contained in the following JSON object:

  • choices.[0].message.content for the default chat response
  • choices[0].delta.content for the streaming chat response

You can extract the message content and print it to the console:

import sys

# Print the assistant's response to the console.
print(str(response.choices[0].message.content))

This creates output like the following:

A red pine, scientifically known as *Pinus resinosa*, is a medium-sized tree that can grow up to 25 meters high and 75 centimeters in diameter. [1, pp. 1]