After uploading files to an assistant, there are two interfaces you can use to chat with the assistant:

  • Standard chat interface: This is the recommended way to chat with an assistant, as it offers more functionality and control over the assistant’s responses and references than the OpenAI-compatible chat interface.

  • OpenAI-compatible chat interface: This interface is based on the OpenAI Chat Completion API, a commonly used and adopted API. It is useful if you need inline citations or OpenAI-compatible responses, but has limited functionality compared to the standard chat interface.

You can chat with an assistant using the Pinecone console. Select the assistant to chat with, and use the Assistant playground.

Chat through the standard interface

You can chat with a Pinecone assistant through the standard chat interface. There are three response types:

  • Default response: The assistant returns a structured response and separate citation information.
  • Streaming response: The assistant returns the response as a text stream.
  • JSON response: The assistant returns the response as JSON key-value pairs.

This is the recommended way to chat with an assistant, as it offers more functionality and control over the assistant’s responses and references. However, if you need your assistant to be OpenAI-compatible or need inline citations, use the OpenAI-compatible chat interface.

Default response

The following example sends a message and requests a default response:

The content parameter in the request cannot be empty.

# To use the Python SDK, install the plugin:
# pip install --upgrade pinecone pinecone-plugin-assistant

from pinecone import Pinecone
from pinecone_plugins.assistant.models.chat import Message

pc = Pinecone(api_key="YOUR_API_KEY")
assistant = pc.assistant.Assistant(assistant_name="example-assistant")

msg = Message(role="user", content="Who is the CFO of Netflix?")
resp = assistant.chat(messages=[msg])

# Alternatively, you can provide a dictionary as the message:
# msg = {"role": "user", "content": "Who is the CFO of Netflix?"}
# resp = assistant.chat(messages=[msg])

print(resp)

The example above returns a result like the following:

JSON
{
    'id': '0000000000000000163008a05b317b7b', 
    'model': 'gpt-4o-2024-05-13', 
    'usage': {
        'prompt_tokens': 9259, 
        'completion_tokens': 30, 
        'total_tokens': 9289
        }, 
        'message': {
            'content': 'The Chief Financial Officer (CFO) of Netflix is Spencer Neumann.', 
            'role': '"assistant"'
            }, 
            'finish_reason': 'stop', 
            'citations': [
                {
                    'position': 63, 
                    'references': [
                        {
                            'pages': [78, 72, 79], 
                            'file': {
                                'name': 'Netflix-10-K-01262024.pdf', 
                                'id': '76a11dd1...', 
                                'metadata': {
                                    'company': 'netflix', 
                                    'document_type': 'form 10k'
                                    }, 
                                    'created_on': '2024-12-06T01:29:07.369208590Z', 
                                    'updated_on': '2024-12-06T01:29:50.923493799Z', 
                                    'status': 'Available', 
                                    'percent_done': 1.0, 
                                    'signed_url': 'https://storage.googleapis.com/...', 
                                    "error_message": null, 
                                    'size': 1073470.0
                                }
                            }
                        ]
                    }
                ]
            }

Streaming response

The following example sends a message and requests a streaming response:

The content parameter in the request cannot be empty.

# To use the Python SDK, install the plugin:
# pip install --upgrade pinecone pinecone-plugin-assistant

from pinecone import Pinecone
from pinecone_plugins.assistant.models.chat import Message

pc = Pinecone(api_key="YOUR_API_KEY")

assistant = pc.assistant.Assistant(assistant_name="example-assistant")

msg = Message(role="user", content="What is the inciting incident of Pride and Prejudice?")

response = assistant.chat(messages=[msg], stream=True)

for data in response:
    if data:
        print(data)

The example above returns a result like the following:

data:{"type":"message_start","id":"0000000000000000111b35de85e8a8f9","model":"gpt-4o-2024-05-13","role":"assistant"}

data:{"type":"content_chunk","id":"0000000000000000111b35de85e8a8f9","model":"gpt-4o-2024-05-13","delta":{"content":"The"}}

...

data:{"type":"citation","id":"0000000000000000111b35de85e8a8f9","model":"gpt-4o-2024-05-13","citation":{"position":406,"references":[{"file":{"status":"Available","id":"ae79e447-b89e-4994-994b-3232ca52a654","name":"Pride-and-Prejudice.pdf","size":2973077,"metadata":null,"updated_on":"2024-06-14T15:01:57.385425746Z","created_on":"2024-06-14T15:01:02.910452398Z","percent_done":0.0,"signed_url":"https://storage.googleapis.com/...", "error_message":null},"pages":[1]}]}}

data:{"type":"message_end","id":"0000000000000000111b35de85e8a8f9","model":"gpt-4o-2024-05-13","finish_reason":"stop","usage":{"prompt_tokens":9736,"completion_tokens":102,"total_tokens":9838}}

There are four types of messages in a streaming chat response:

  • Message start: Includes "role":"assistant", which indicates that the assistant is responding to the user’s message.
  • Content: Includes a value in the content field (e.g., "content":"The"), which is part of the assistant’s streamed response to the user’s message.
  • Citation: Includes a citation to the document that the assistant used to generate the response.
  • Message end: Includes "finish_reason":"stop", which indicates that the assistant has finished responding to the user’s message.

JSON response

The following example uses the json_response parameter to instruct the assistant to return the response as JSON key-value pairs. This is useful if you need to parse the response programmatically.

JSON response cannot be used with the stream parameter.

# To use the Python SDK, install the plugin:
# pip install --upgrade pinecone pinecone-plugin-assistant

import json
from pinecone import Pinecone
from pinecone_plugins.assistant.models.chat import Message

pc = Pinecone(api_key="YOUR_API_KEY")

assistant = pc.assistant.Assistant(assistant_name="example-assistant")

msg = Message(role="user", content="Who is the CFO and CEO of Netflix?")

response = assistant.chat(messages=[msg], json_response=True)

print(json.loads(response.message.content))

The example above returns a result like the following:

{
  "finish_reason": "stop",
  "message": {
    "role": "assistant",
    "content": "{\"CFO\": \"Spencer Neumann\", \"CEO\": \"Ted Sarandos and Greg Peters\"}"
  },
  "id": "0000000000000000680c95d2faab7aad",
  "model": "gpt-4o-2024-11-20",
  "usage": {
    "prompt_tokens": 14298,
    "completion_tokens": 42,
    "total_tokens": 14340
  },
  "citations": [
    {
      "position": 24,
      "references": [
        {
          "file": {
            "status": "Available",
            "id": "cbecaa37-2943-4030-b4d6-ce4350ab774a",
            "name": "Netflix-10-K-01262024.pdf",
            "size": 1073470,
            "metadata": {
              "test-key": "test-value"
            },
            "updated_on": "2025-01-24T16:53:17.148820770Z",
            "created_on": "2025-01-24T16:52:44.851577534Z",
            "percent_done": 1,
            "signed_url": "https://storage.googleapis.com/knowledge-prod-files/bf0dcf22...",
            "error_message": null
          },
          "pages": [
            79
          ],
          "highlight": null
        },
    ...
  ]
}

Chat through an OpenAI-compatible interface

The OpenAI-compatible chat interface is based on the OpenAI Chat Completion API, a commonly used and adopted API. It is useful if you need inline citations or OpenAI-compatible responses, but has limited functionality compared to the standard chat interface. It returns responses in two formats:

  • Default response: The assistant returns a response in a single string field, which includes citation information.
  • Streaming response: The assistant returns the response as a text stream.

If you do not need your assistant to be OpenAI-compatible or need inline citations, use the standard chat interface.

Default response

The following example sends a message and requests a response in the default format:

The content parameter in the request cannot be empty.

# To use the Python SDK, install the plugin:
# pip install --upgrade pinecone pinecone-plugin-assistant

from pinecone import Pinecone
from pinecone_plugins.assistant.models.chat import Message

pc = Pinecone(api_key="YOUR_API_KEY")

# Get your assistant.
assistant = pc.assistant.Assistant(
    assistant_name="example-assistant", 
)

# Chat with the assistant.
chat_context = [Message(role="user", content='What is the maximum height of a red pine?')]
response = assistant.chat_completions(messages=chat_context)

The example above returns a result like the following:

{"chat_completion":
  {
    "id":"chatcmpl-9OtJCcR0SJQdgbCDc9JfRZy8g7VJR",
    "choices":[
      {
        "finish_reason":"stop",
        "index":0,
        "message":{
          "role":"assistant",
          "content":"The maximum height of a red pine (Pinus resinosa) is up to 25 meters."
        }
      }
    ],
    "model":"my_assistant"
  }
}

Streaming response

The following example sends a messages and requests a streaming response:

The content parameter in the request cannot be empty.

# To use the Python SDK, install the plugin:
# pip install --upgrade pinecone pinecone-plugin-assistant

from pinecone import Pinecone
from pinecone_plugins.assistant.models.chat import Message

pc = Pinecone(api_key="YOUR_API_KEY")

# Get your assistant.
assistant = pc.assistant.Assistant(
    assistant_name="example-assistant" 
)

# Streaming chat with the Assistant.
chat_context = [Message(role="user", content="What is the maximum height of a red pine?")]
response = assistant.chat_completions(messages=[chat_context], stream=True, model="gpt-4o")

for data in response:
    if data:
        print(data)

The example above returns a result like the following:

{
  'id': '000000000000000009de65aa87adbcf0', 
  'choices': [
      {
      'index': 0, 
      'delta': 
        {
        'role': 'assistant', 
        'content': 'The'
        }, 
      'finish_reason': None
      }
    ], 
  'model': 'gpt-4o-2024-05-13'
}

...

{
  'id': '00000000000000007a927260910f5839',
  'choices': [
      {
      'index': 0,
      'delta':
        {
          'role': '', 
          'content': 'The'
        }, 
      'finish_reason': None
      }
    ], 
  'model': 'gpt-4o-2024-05-13'
}

...

{
  'id': '00000000000000007a927260910f5839', 
  'choices': [
    {
      'index': 0, 
      'delta': 
        {
        'role': None, 
        'content': None
        }, 
      'finish_reason': 'stop'
      }
    ], 
  'model': 'gpt-4o-2024-05-13'
}

There are three types of messages in a chat completion response:

  • Message start: Includes "role":"assistant", which indicates that the assistant is responding to the user’s message.
  • Content: Includes a value in the content field (e.g., "content":"The"), which is part of the assistant’s streamed response to the user’s message.
  • Message end: Includes "finish_reason":"stop", which indicates that the assistant has finished responding to the user’s message.

Provide conversation history in a chat request

Models lack memory of previous requests, so any relevant messages from earlier in the conversation must be present in the messages object.

In the following example, the messages object includes prior messages that are necessary for interpreting the newest message.

# To use the Python SDK, install the plugin:
# pip install --upgrade pinecone pinecone-plugin-assistant

from pinecone import Pinecone
from pinecone_plugins.assistant.models.chat import Message

pc = Pinecone(api_key="YOUR_API_KEY")

# Get your assistant.
assistant = pc.assistant.Assistant(
    assistant_name="example-assistant", 
)

# Chat with the assistant.
chat_context = [
    Message(content="What is the maximum height of a red pine?", role="user"),
    Message(content="The maximum height of a red pine (Pinus resinosa) is up to 25 meters.", role="assistant"),
    Message(content="What is its maximum diameter?", role="user")
]
response = assistant.chat(messages=chat_context)

The above example request returns a response like the following:

{
  "finish_reason":"stop",
  "message":{
    "role":"assistant",
    "content":"The maximum diameter of a red pine (Pinus resinosa) is up to 1 meter."
    },
    "id":"0000000000000000236a24a17e55309a",
    "model":"gpt-4o-2024-05-13",
    "usage":{
      "prompt_tokens":21377,
      "completion_tokens":20,
      "total_tokens":21397
      },
      "citations":[...]
}

Filter chat with metadata

You can filter which documents to use for chat completions. The following example filters the responses to use only documents that include the metadata "resource": "encyclopedia".

# To use the Python SDK, install the plugin:
# pip install --upgrade pinecone pinecone-plugin-assistant

from pinecone import Pinecone
from pinecone_plugins.assistant.models.chat import Message

pc = Pinecone(api_key="YOUR_API_KEY")

# Get your assistant.
assistant = pc.assistant.Assistant(
    assistant_name="example-assistant", 
)

# Chat with the assistant.
chat_context = [Message(role="user", content="What is the maximum height of a red pine?")]
response = assistant.chat(messages=chat_context, stream=True, filter={"resource": "encyclopedia"})

Choose a model for your assistant

Pinecone Assistant uses the gpt-4o model by default. Alternatively, you can use the claude-3-5-sonnet model. Select the LLM to use by setting the model parameter in the request:

# To use the Python SDK, install the plugin:
# pip install --upgrade pinecone pinecone-plugin-assistant

from pinecone import Pinecone
from pinecone_plugins.assistant.models.chat import Message

pc = Pinecone(api_key="YOUR_API_KEY")

# Get your assistant.
assistant = pc.assistant.Assistant(
    assistant_name="example-assistant", 
)

# Chat with the assistant.
chat_context = [Message(role="user", content="What is the maximum height of a red pine?")]
response = assistant.chat(messages=chat_context, stream=True, model="claude-3-5-sonnet")

Include citation highlights in the response

Citation highlights are available in the Pinecone console or API versions 2025-04 and later.

When using the standard chat interface, every response includes a citation object. The object includes a reference to the document that the assistant used to generate the response. Additionally, you can include highlights, which are the specific parts of the document that the assistant used to generate the response, by setting the include_highlights parameter to true in the request:

curl
PINECONE_API_KEY="YOUR_API_KEY"
ASSISTANT_NAME="example-assistant"

curl "https://prod-1-data.ke.pinecone.io/assistant/chat/$ASSISTANT_NAME" \
  -H "Api-Key: $PINECONE_API_KEY" \
  -H "Content-Type: application/json" \
  -H "X-Pinecone-API-Version: 2025-04" \
  -d '{
  "messages": [
    {
      "role": "user",
      "content": "Who is the CFO of Netflix?"
    }
  ],
  "stream": false,
  "model": "gpt-4o",
  "include_highlights": true
}'

The example above will return a response like the following:

{
  "finish_reason":"stop",
  "message":{
    "role":"assistant",
    "content":"The Chief Financial Officer (CFO) of Netflix is Spencer Neumann."
    },
    "id":"00000000000000006685b07087b1ad42",
    "model":"gpt-4o-2024-05-13",
    "usage":{
      "prompt_tokens":12490,
      "completion_tokens":33,
      "total_tokens":12523
      },
      "citations":[{
        "position":63,
        "references":[{
          "file":{
            "status":"Available",
            "id":"cbecaa37-2943-4030-b4d6-ce4350ab774a",
            "name":"Netflix-10-K-01262024.pdf",
            "size":1073470,
            "metadata":{"test-key":"test-value"},
            "updated_on":"2025-01-24T16:53:17.148820770Z",
            "created_on":"2025-01-24T16:52:44.851577534Z",
            "percent_done":1.0,
            "signed_url":"https://storage.googleapis.com/knowledge-prod-files/b...",
            "error_message":null
            },
            "pages":[78],
            "highlight":{
              "type":"text",
              "content":"EXHIBIT 31.3\nCERTIFICATION OF CHIEF FINANCIAL OFFICER\nPURSUANT TO SECTION 302 OF THE SARBANES-OXLEY ACT OF 2002\nI, Spencer Neumann, certify that:"
              }
            },
            {
              "file":{
                "status":"Available",
                "id":"cbecaa37-2943-4030-b4d6-ce4350ab774a",
                "name":"Netflix-10-K-01262024.pdf",
                "size":1073470,
                "metadata":{"test-key":"test-value"},
                "updated_on":"2025-01-24T16:53:17.148820770Z",
                "created_on":"2025-01-24T16:52:44.851577534Z",
                "percent_done":1.0,
                "signed_url":"https://storage.googleapis.com/knowledge-prod-files/bf...",
                "error_message":null
                },
                "pages":[79],
                "highlight":{
                  "type":"text",
                  "content":"operations of\nNetflix, Inc.\nDated: January 26, 2024  By:  /S/ SPENCER NEUMANN\n  Spencer Neumann\n  Chief Financial Officer"
                }
            }
          ]
        }
      ]
}

Enabling highlights will increase token usage.

Extract the response content

Both the standard and OpenAI-compatible chat interfaces return a JSON response object containing the assistant’s chat response along with other information. The message string is contained in the following JSON object:

  • choices.[0].message.content for a JSON chat response
  • choices[0].delta.content for a streaming chat response

You can extract the message content and print it to the console:

import sys

# Print the assistant JSON response to the console.
print(str(response.choices[0].message.content))

This creates output like the following:

JSON response
A red pine, scientifically known as *Pinus resinosa*, is a medium-sized tree that can grow up to 25 meters high and 75 centimeters in diameter. [1, pp. 1]