> ## Documentation Index
> Fetch the complete documentation index at: https://docs.pinecone.io/llms.txt
> Use this file to discover all available pages before exploring further.

# Chat with an assistant

> Chat with an assistant and get back citations in structured form. 

This is the recommended way to chat with an assistant, as it offers more functionality and control over the assistant's responses and references than the OpenAI-compatible chat interface.

For guidance and examples, see [Chat with an assistant](https://docs.pinecone.io/guides/assistant/chat-with-assistant).

<RequestExample>
  ```bash curl | Default theme={null}
  PINECONE_API_KEY="YOUR_API_KEY"
  ASSISTANT_NAME="example-assistant"

  curl "https://prod-1-data.ke.pinecone.io/assistant/chat/$ASSISTANT_NAME" \
    -H "Api-Key: $PINECONE_API_KEY" \
    -H "Content-Type: application/json" \
    -d '{
    "messages": [
      {
        "role": "user",
        "content": "What is the inciting incident of Pride and Prejudice?"
      }
    ],
    "stream": false,
    "model": "gpt-4o"
  }'
  ```

  ```bash curl | Streaming theme={null}
  PINECONE_API_KEY="YOUR_API_KEY"
  ASSISTANT_NAME="example-assistant"

  curl "https://prod-1-data.ke.pinecone.io/assistant/chat/$ASSISTANT_NAME" \
    -H "Api-Key: $PINECONE_API_KEY" \
    -H "Content-Type: application/json" \
    -H "X-Pinecone-Api-Version: 2026-04" \
    -d '{
    "messages": [
      {
        "role": "user",
        "content": "What is the inciting incident of Pride and Prejudice?"
      }
    ],
    "stream": true,
    "model": "gpt-4o"
  }'
  ```
</RequestExample>

<ResponseExample>
  ```json Default response theme={null}
  {
    "finish_reason": "stop",
    "message": {
      "role": "assistant",
      "content": "The inciting incident of \"Pride and Prejudice\" occurs when Mrs. Bennet informs Mr. Bennet that Netherfield Park has been let at last, and she is eager to share the news about the new tenant, Mr. Bingley, who is wealthy and single. This sets the stage for the subsequent events of the story, including the introduction of Mr. Bingley and Mr. Darcy to the Bennet family and the ensuing romantic entanglements."
    },
    "id": "00000000000000004ac3add5961aa757",
    "model": "gpt-4o-2024-05-13",
    "usage": {
      "prompt_tokens": 9736,
      "completion_tokens": 105,
      "total_tokens": 9841
    },
    "citations": [
      {
        "position": 406,
        "references": [
          {
            "file": {
              "status": "Available",
              "id": "ae79e447-b89e-4994-994b-3232ca52a654",
              "name": "Pride-and-Prejudice.pdf",
              "size": 2973077,
              "metadata": null,
              "updated_on": "2024-06-14T15:01:57.385425746Z",
              "created_on": "2024-06-14T15:01:02.910452398Z",
              "signed_url": "https://storage.googleapis.com/..."
            },
            "pages": [
              1
            ]
          }
        ]
      }
    ]
  }

  ```

  ```text Streaming response theme={null}
  data:{
    "type":"message_start",
    "id":"0000000000000000111b35de85e8a8f9",
    "model":"gpt-4o-2024-05-13",
    "role":"assistant"
  }

  data:
  {
    "type":"content_chunk",
    "id":"0000000000000000111b35de85e8a8f9",
    "model":"gpt-4o-2024-05-13",
    "delta":
    {
      "content":"The"
      }
  }

  ...

  data:
  {
    "type":"citation",
    "id":"0000000000000000111b35de85e8a8f9",
    "model":"gpt-4o-2024-05-13",
    "citation":
    {
      "position":406,
      "references":
      [
        {
          "file":{
            "status":"Available",
            "id":"ae79e447-b89e-4994-994b-3232ca52a654",
            "name":"Pride-and-Prejudice.pdf",
            "size":2973077,
            "metadata":null,
            "updated_on":"2024-06-14T15:01:57.385425746Z",
            "created_on":"2024-06-14T15:01:02.910452398Z",
            "signed_url":"https://storage.googleapis.com/..."
            },
        "pages":[1]
        }
      ]
    }
  }

  data:
  {
    "type":"message_end",
    "id":"0000000000000000111b35de85e8a8f9",
    "model":"gpt-4o-2024-05-13",
    "finish_reason":"stop",
    "usage":
    {
      "prompt_tokens":9736,
      "completion_tokens":102,
      "total_tokens":9838
      }
  }
  ```
</ResponseExample>


## OpenAPI

````yaml https://raw.githubusercontent.com/pinecone-io/pinecone-api/refs/heads/main/2026-04/assistant_data_2026-04.oas.yaml POST /chat/{assistant_name}
openapi: 3.0.3
info:
  title: Pinecone assistant data plane API
  description: >-
    Pinecone Assistant Engine is a context engine to store and retrieve relevant
    knowledge from millions of documents at scale. This API supports
    interactions with assistants.
  contact:
    name: Pinecone Support
    url: https://support.pinecone.io
    email: support@pinecone.io
  license:
    name: Apache 2.0
    url: https://www.apache.org/licenses/LICENSE-2.0
  version: 2026-04
servers:
  - url: https://{assistant_host}
    variables:
      assistant_host:
        default: unknown
        description: The host of the created assistant
security:
  - ApiKeyAuth: []
tags:
  - name: Manage Assistants
    description: Actions that manage Assistants
paths:
  /chat/{assistant_name}:
    post:
      tags:
        - Manage Assistants
      summary: Chat with an assistant
      description: >-
        Chat with an assistant and get back citations in structured form. 


        This is the recommended way to chat with an assistant, as it offers more
        functionality and control over the assistant's responses and references
        than the OpenAI-compatible chat interface.


        For guidance and examples, see [Chat with an
        assistant](https://docs.pinecone.io/guides/assistant/chat-with-assistant).
      operationId: chat_assistant
      parameters:
        - in: header
          name: X-Pinecone-Api-Version
          description: Required date-based version header
          required: true
          schema:
            default: 2026-04
            type: string
          style: simple
        - in: path
          name: assistant_name
          description: The name of the assistant to be described.
          required: true
          schema:
            type: string
          example: test-assistant
          style: simple
      requestBody:
        description: The desired configuration to chat with an assistant.
        content:
          application/json:
            schema:
              $ref: '#/components/schemas/ChatRequest'
        required: true
      responses:
        '200':
          description: Search request successful.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ChatModel'
            text/event-stream:
              schema:
                $ref: '#/components/schemas/StreamChatChunkModel'
        '400':
          description: Bad request. The request body included invalid request parameters.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              examples:
                files-validation-error:
                  summary: Validation error on ingest.
                  value:
                    error:
                      code: INVALID_ARGUMENT
                      message: >-
                        Uploaded file can only currently be either a pdf or txt
                        file
                    status: 400
        '401':
          description: 'Unauthorized. Possible causes: Invalid API key.'
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              examples:
                unauthorized:
                  summary: Unauthorized
                  value:
                    error:
                      code: UNAUTHENTICATED
                      message: Invalid API key.
                    status: 401
        '404':
          description: Assistant not found.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              examples:
                assistant-not-found:
                  summary: Assistant not found.
                  value:
                    error:
                      code: NOT_FOUND
                      message: Assistant "example-assistant" not found.
                    status: 404
        '500':
          description: Internal server error.
          content:
            application/json:
              schema:
                $ref: '#/components/schemas/ErrorResponse'
              examples:
                internal-server-error:
                  summary: Internal server error
                  value:
                    error:
                      code: UNKNOWN
                      message: Internal server error
                    status: 500
components:
  schemas:
    ChatRequest:
      description: Represents a request to chat with an assistant.
      type: object
      properties:
        messages:
          description: >-
            The list of messages sent to the assistant, used for context
            retrieval and generating response with the LLM.
          type: array
          items:
            $ref: '#/components/schemas/MessageModel'
        stream:
          description: >-
            If `false`, the assistant returns a single JSON response. If `true`,
            the assistant returns a stream of responses.
          default: false
          type: boolean
        model:
          description: The large language model used to generate responses.
          default: gpt-4o
          x-enum:
            - gpt-4o
            - gpt-4.1
            - gpt-5
            - o4-mini
            - claude-sonnet-4-5
            - gemini-2.5-pro
          type: string
        temperature:
          description: >-
            Controls the randomness of the model's output: lower values make
            responses more deterministic, while higher values increase
            creativity and variability. If the model does not support a
            temperature parameter, the parameter will be ignored.
          default: 0
          type: number
          format: float
        filter:
          example:
            genre:
              $ne: documentary
          description: >-
            Optional metadata-based filter to restrict which documents are
            retrieved for the assistant's response context.
          type: object
        json_response:
          description: >-
            If `true`, instructs the assistant to return a JSON-formatted
            response. Cannot be used together with streaming mode.
          default: false
          type: boolean
        include_highlights:
          description: >-
            If `true`, instructs the assistant to include highlights from the
            referenced documents that support its response.
          default: false
          type: boolean
        context_options:
          $ref: '#/components/schemas/ContextOptionsModel'
      required:
        - messages
    ChatModel:
      description: Describes the response format of a chat request.
      type: object
      properties:
        id:
          description: A unique identifier for this chat response.
          type: string
        finish_reason:
          description: >-
            Indicates why the chat response generation stopped. This signals the
            end of the response.

            - `stop`: The model finished generating the response.  

            - `length`: Generation was cut off because the maximum number of
            tokens allowed was reached.

            - `content_filter`: Generation stopped because content was blocked
            by content filtering rules. 
              (for example, content that contains hate speech or violent material).

            - `tool_calls`: Generation stopped because a tool call was
            triggered.
          x-enum:
            - stop
            - length
            - content_filter
            - tool_calls
          type: string
        message:
          $ref: '#/components/schemas/MessageModel'
        model:
          description: >-
            The name or identifier of the model used to generate this chat
            response.
          type: string
        citations:
          description: Citations supporting the information in the response.
          type: array
          items:
            $ref: '#/components/schemas/CitationModel'
        usage:
          $ref: '#/components/schemas/UsageModel'
        context_snippet_count:
          description: >-
            The number of context snippets provided to the model to generate the
            response. This indicates how much retrieved information was
            available for the  generation, allowing for logic to be applied if
            no context was found (count is 0).
          type: integer
        content_filter_results:
          $ref: '#/components/schemas/ContentFilterResults'
    StreamChatChunkModel:
      description: Represents a chunk of a stream chat response.
      discriminator:
        propertyName: type
        mapping:
          message_start:
            $ref: '#/components/schemas/MessageStartModel'
          content_chunk:
            $ref: '#/components/schemas/ContentChunkModel'
          citation:
            $ref: '#/components/schemas/CitationChunkModel'
          message_end:
            $ref: '#/components/schemas/MessageEndModel'
      type: object
      oneOf:
        - $ref: '#/components/schemas/MessageStartModel'
        - $ref: '#/components/schemas/ContentChunkModel'
        - $ref: '#/components/schemas/CitationChunkModel'
        - $ref: '#/components/schemas/MessageEndModel'
    ErrorResponse:
      example:
        error:
          code: TOO_MANY_REQUESTS
          message: Too many get or list assistant requests, try again later
        status: 429
      description: The response shape used for all error responses.
      type: object
      properties:
        status:
          example: 500
          description: The HTTP status code of the error.
          type: integer
        error:
          example:
            code: INVALID_ARGUMENT
            message: 'Invalid region: Valid options are us, eu'
          description: Detailed information about the error that occurred.
          type: object
          properties:
            code:
              description: The status code associated with the error.
              x-enum:
                - OK
                - UNKNOWN
                - INVALID_ARGUMENT
                - DEADLINE_EXCEEDED
                - QUOTA_EXCEEDED
                - NOT_FOUND
                - ALREADY_EXISTS
                - PERMISSION_DENIED
                - UNAUTHENTICATED
                - RESOURCE_EXHAUSTED
                - FAILED_PRECONDITION
                - ABORTED
                - OUT_OF_RANGE
                - UNIMPLEMENTED
                - INTERNAL
                - UNAVAILABLE
                - DATA_LOSS
                - FORBIDDEN
                - TOO_MANY_REQUESTS
              type: string
            message:
              example: Message content cannot be empty
              description: A message providing details about the error.
              type: string
            details:
              description: >-
                Additional information about the error. This field is not
                guaranteed to be present.
              type: object
          required:
            - code
            - message
      required:
        - status
        - error
    MessageModel:
      description: Describes the format of a message in a chat.
      type: object
      properties:
        role:
          description: >-
            The role of the message author, it can be `user`, `assistant`, or
            `system`.
          type: string
        content:
          description: The textual content of this partial message.
          type: string
    ContextOptionsModel:
      description: Controls the context snippets sent to the LLM.
      type: object
      properties:
        top_k:
          example: 20
          description: >-
            The maximum number of context snippets to use. Default is 16.
            Maximum is 64.
          type: integer
        snippet_size:
          example: 4096
          description: >-
            The maximum context snippet size. Default is 2048 tokens. Minimum is
            512 tokens. Maximum is 8192 tokens.
          type: integer
        multimodal:
          description: >-
            Whether or not to send image-related context snippets to the LLM. If
            `false`, only text context snippets are sent.
          default: true
          type: boolean
        include_binary_content:
          description: >-
            If image-related context snippets are sent to the LLM, this field
            determines whether or not they should include base64 image data. If
            `false`, only the image caption is sent. Only available when
            `multimodal=true`.
          default: true
          type: boolean
    CitationModel:
      description: >-
        Describes a single citation included in a chat response, pointing to one
        or more referenced sources.
      type: object
      properties:
        position:
          description: The index position of the citation in the complete text response.
          type: integer
        references:
          description: A list of file references that this citation points to.
          type: array
          items:
            $ref: '#/components/schemas/ReferenceModel'
    UsageModel:
      description: >-
        Describes the token usage associated with interactions with an
        assistant.
      type: object
      properties:
        prompt_tokens:
          description: >-
            For chat interactions, the number of tokens in the LLM request
            (message, context snippets, and system prompt).

            For context retrieval, the number of tokens in the LLM request used
            to generate search queries from the messages, plus the tokens in the
            retrieved context snippets.
          type: integer
        completion_tokens:
          description: >-
            For chat interactions, the number of tokens in the assistant's
            response.  

            For context retrieval, this is always 0.
          type: integer
        total_tokens:
          description: >-
            The total number of tokens used, equal to the sum of `prompt_tokens`
            and `completion_tokens`.
          type: integer
    ContentFilterResults:
      description: >-
        Content filter results provided by the LLM, describing safety-related
        classifications applied to the content. The structure may vary depending
        on the model and the content being filtered. The `spec` field identifies
        the provider, and determines the structure of `results`.
      type: object
      properties:
        spec:
          description: Identifier of the model provider.
          x-enum:
            - openai
            - gemini
          type: string
        results:
          description: >-
            Content filter results returned by the provider. The structure
            depend on the `spec` value.
    MessageStartModel:
      example:
        context_snippet_count: 16
        id: 00000000000000002fe0c02e20be1c6a
        model: gpt-4o-2024-11-20
        role: assistant
        type: message_start
      title: Message start stream chunk
      description: >-
        The start message of a stream chat response. This chunk initializes the 
        response by providing the unique identifier, the model, and the role  of
        the author. It also provides the count of retrieved context snippets 
        available to the model before any content is sent, allowing for 
        decisions on how to handle cases where no relevant context was found.
      type: object
      properties:
        type:
          description: The type of stream chunk. Always `message_start`.
          type: string
        id:
          description: A unique identifier for this chat response.
          type: string
        model:
          description: >-
            The name or identifier of the model used to generate this chat
            response.
          type: string
        role:
          description: >-
            The role of the message author, it can be `user`, `assistant`, or
            `system`.
          type: string
        context_snippet_count:
          description: >-
            The number of context snippets provided to the model to generate the
            response. This indicates how much retrieved information was
            available for the  generation, allowing for logic to be applied if
            no context was found (count is 0).
          type: integer
        content_filter_results:
          $ref: '#/components/schemas/ContentFilterResults'
      required:
        - type
        - id
        - model
        - role
    ContentChunkModel:
      example:
        delta:
          content: Hello
        id: 00000000000000002fe0c02e20be1c6a
        model: gpt-4o-2024-11-20
        type: content_chunk
      title: Content stream chunk
      description: >-
        A content chunk in a stream chat response that contains a partial
        segment of the assistant's response. The `delta.content` property
        provides a string fragment that should be appended to the previously
        received fragments to construct the complete message.
      type: object
      properties:
        type:
          description: The type of stream chunk. Always `content_chunk`.
          type: string
        id:
          description: A unique identifier for this chat response.
          type: string
        model:
          description: >-
            The name or identifier of the model used to generate this chat
            response.
          type: string
        delta:
          description: >-
            The format of this partial message. For example, if the response is
            "Hello world", the first chunk's `delta.content` might contain
            "Hello" and the second chunk's `delta.content` might contain "
            world".
          type: object
          properties:
            content:
              description: The text content of this partial message.
              type: string
          required:
            - content
        content_filter_results:
          $ref: '#/components/schemas/ContentFilterResults'
      required:
        - type
        - id
        - model
        - delta
    CitationChunkModel:
      example:
        citation:
          position: 53
          references:
            - file:
                created_on: '2025-01-01T00:00:00.000Z'
                id: ae79e447-b89e-4994-994b-3232ca52a654
                metadata: null
                multimodal: false
                name: my_file.pdf
                signed_url: https://storage.googleapis.com/...
                size: 25000
                status: Available
                updated_on: '2025-01-01T00:01:00.000Z'
              pages:
                - 1
                - 2
        id: 00000000000000002fe0c02e20be1c6a
        model: gpt-4o-2024-11-20
        type: citation
      title: Citation stream chunk
      description: >-
        A citation chunk in a stream chat response that identifies the
        documents  used to justify a specific claim or section of the
        assistant's response.  It maps a character offset (`position`) within
        the accumulated message  to specific references, allowing the client to
        render footnotes or  links at the exact point of the claim to ensure the
        response is  grounded in the provided files.
      type: object
      properties:
        type:
          description: The type of stream chunk. Always `citation`.
          type: string
        id:
          description: A unique identifier for this chat response.
          type: string
        model:
          description: >-
            The name or identifier of the model used to generate this chat
            response.
          type: string
        citation:
          $ref: '#/components/schemas/CitationModel'
      required:
        - type
        - id
        - model
        - citation
    MessageEndModel:
      example:
        finish_reason: stop
        id: 00000000000000002fe0c02e20be1c6a
        model: gpt-4o-2024-11-20
        type: message_end
        usage:
          completion_tokens: 135
          prompt_tokens: 2506
          total_tokens: 2641
      title: Message end stream chunk
      description: >-
        The end message in a stream chat response. This chunk signals that the 
        assistant has finished sending the response and provides the reason 
        generation stopped along with the token consumption for the interaction.
      type: object
      properties:
        type:
          description: The type of stream chunk. Always `message_end`.
          type: string
        id:
          description: A unique identifier for this chat response.
          type: string
        model:
          description: >-
            The name or identifier of the model used to generate this chat
            response.
          type: string
        finish_reason:
          description: >-
            Indicates why the chat response generation stopped. This signals
            the  end of the stream and no further chunks will be sent for this
            response.

            - `stop`: The model finished generating the response.  

            - `length`: Generation was cut off because the maximum number of
            tokens allowed was reached.

            - `content_filter`: Generation stopped because content was blocked
            by content filtering rules. 
              (for example, content that contains hate speech or violent material).

            - `tool_calls`: Generation stopped because a tool call was
            triggered.
          x-enum:
            - stop
            - length
            - content_filter
            - tool_calls
          type: string
        usage:
          $ref: '#/components/schemas/UsageModel'
        content_filter_results:
          $ref: '#/components/schemas/ContentFilterResults'
      required:
        - type
        - id
        - model
        - finish_reason
    ReferenceModel:
      description: Describes a single reference in a citation.
      type: object
      properties:
        file:
          $ref: '#/components/schemas/AssistantFileModel'
        pages:
          description: >-
            A list of page numbers in the referenced document that contain the
            relevant content.
          type: array
          items:
            type: integer
        highlight:
          $ref: '#/components/schemas/HighlightModel'
    AssistantFileModel:
      description: The response format for a successful file upload request.
      type: object
      properties:
        name:
          description: The name of the uploaded file.
          type: string
        id:
          description: >-
            The unique identifier for the uploaded file. This may be a
            user-provided identifier or a system-generated ID.
          type: string
        size:
          example: 1048576
          description: The size of the uploaded file, in bytes.
          type: integer
          format: int64
        metadata:
          nullable: true
          example:
            created_by: Jane Doe
            published: '2025-10-01T00:00:00.000Z'
            tags:
              - report
              - Q4
              - analytics
          description: >-
            Optional metadata associated with the file. This metadata can be
            used to filter files when listing them or to restrict search results
            when querying the assistant.
          type: object
        created_on:
          example: '2025-10-01T12:30:00.000Z'
          description: >-
            The timestamp when the file was uploaded, in ISO 8601 format
            (`YYYY-MM-DDTHH:MM:SSZ`).
          type: string
          format: date-time
        updated_on:
          example: '2025-10-01T12:45:00.000Z'
          description: >-
            The timestamp of the most recent update to the file, in ISO 8601
            format (`YYYY-MM-DDTHH:MM:SSZ`).
          type: string
          format: date-time
        status:
          description: >-
            The current state of the uploaded file. Possible values:

            - `Processing`: File is being processed (parsed, chunked, embedded)

            - `Available`: Processing completed successfully; file is ready for
            use

            - `Deleting`: Deletion has been initiated but not yet completed

            - `ProcessingFailed`: Processing failed with an error


            Note: Once a file is deleted, the API returns 404 Not Found instead
            of a file object.
          x-enum:
            - Processing
            - Available
            - Deleting
            - ProcessingFailed
          type: string
        signed_url:
          nullable: true
          example: https://storage.googleapis.com/bucket/file.pdf?...
          description: >-
            A [signed
            URL](https://cloud.google.com/storage/docs/access-control/signed-urls)
            that provides temporary, read-only access to the file.  Anyone with
            the link can access the file, so treat it as sensitive data. Expires
            after a short time.
          type: string
        multimodal:
          description: Indicates whether the file was processed as multimodal.
          type: boolean
      required:
        - id
        - name
    HighlightModel:
      nullable: true
      description: >-
        Represents a portion of a referenced document that directly supports or
        is relevant to the response.
      type: object
      properties:
        type:
          description: The type of the highlight. Only `text` is supported.
          type: string
        content:
          description: >-
            The text content of the highlighted portion from the referenced
            document.
          type: string
      required:
        - type
        - content
  securitySchemes:
    ApiKeyAuth:
      type: apiKey
      in: header
      name: Api-Key
      description: Pinecone API Key

````