This feature is in public preview.
Pinecone assistants support multimodal context, allowing them to understand and respond to questions about images embedded in PDF documents. This enables use cases like:
  • Analyzing charts, graphs, and diagrams in financial reports
  • Understanding infographics and visual data in research papers
  • Interpreting visual layouts in technical documentation

How it works

When you enable multimodal context for a PDF:
  1. Pinecone extracts text and images (raster or vector) from the file.
  2. During chat or context queries, the assistant retrieves text and image snippets. Image snippets can include captions and base64 image data.
  3. The LLM receives multimodal context and uses it to generate responses.
For an overview of how Pinecone Assistant works, see Pinecone Assistant architecture.

Try it out

The following steps demonstrate how to create an assistant, provide it with a PDF that contains images, and then query that assistant using chat and context APIs.
All versions of Pinecone’s Assistant API allow you to upload multimodal PDFs.

1. Create an assistant

First, if you don’t have one, create an assistant:
from pprint import pprint
from pinecone import Pinecone

pc = Pinecone("YOUR_API_KEY") 
assistant = pc.assistant.create_assistant(
    assistant_name="example-assistant-multimodal", 
    instructions="You are a helpful assistant that can understand both text and images in documents.",
    region="us",
    timeout=30
)

print(f"Type: {type(assistant).__name__}")
pprint(assistant)
Response:
Type: AssistantModel
{'created_at': '2025-08-28T23:35:26.917953498Z',
  'host': 'https://prod-1-data.ke.pinecone.io',
  'instructions': 'You are a helpful assistant that can understand both text '
                  'and images in documents.',
  'metadata': {},
  'name': 'example-assistant-multimodal',
  'status': 'Ready',
  'updated_at': '2025-08-28T23:35:28.507639215Z'}
You don’t need to create a new assistant to use multimodal context. Existing assistants can enable multimodal context for newly uploaded PDFs, as described in the next section.

2. Upload a multimodal PDF

To enable multimodal context for a PDF, when uploading the file, set the multimodal URL parameter to true (defaults to false).
from pprint import pprint
from pinecone import Pinecone

pc = Pinecone("YOUR_API_KEY") 
assistant = pc.assistant.Assistant(assistant_name="example-assistant-multimodal")

# timeout=None allows the SDK to wait for file processing to complete before returning.
# This parameter is only available in the SDK, not in direct API calls.
file_model = assistant.upload_file(
    file_path="./document.pdf",
    multimodal=True,
    timeout=None
)

pprint(file_model)
Response:
# Formatted for readability
FileModel(
  name='document.pdf', 
  id='9c322597-58d6-4ebc-84b5-a398b620da01', 
  metadata=None, 
  created_on='2025-08-28T23:41:41.982805815Z', 
  updated_on='2025-08-28T23:42:09.562949544Z', 
  status='Available', 
  percent_done=1.0, 
  signed_url=None, 
  error_message=None, 
  size=1236044.0, 
  multimodal=True
)
  • The multimodal parameter is only available for PDF files.
  • To check the status of a file, use the describe a file upload endpoint.

3. Chat with the assistant

Now, chat with your assistant. To tell the assistant to provide image-related context to the LLM:
  • Set the multimodal request parameter to true (default) in the context_options object. Setting multimodal to false means the LLM only receives text snippets.
  • When multimodal is true, use include_binary_content to specify what image context the LLM should receive: base64 image data and captions (true) or captions only (false).
Sending image-related context to the LLM (whether captions, base64 data, or both) increases token usage. Learn about monitoring spend and usage.
from pprint import pprint
from pinecone import Pinecone
from pinecone_plugins.assistant.models.chat import Message

pc = Pinecone("YOUR_API_KEY") 
assistant = pc.assistant.Assistant(assistant_name="example-assistant-multimodal")

msg = Message(
    role="user", 
    content="Describe the symbol on the paper tray that indicates the maximum fill level."
)

chat_response = assistant.chat(
  messages=[msg],
  context_options={
      "multimodal": True,
      "include_binary_content": True,
      "top_k": 10,
      "snippet_size": 2048
  }
)

pprint(chat_response)
Response:
# Formatted for readability
ChatResponse(
  id='00000000000000000fe49626f3ee5164', 
  model='gpt-4o-2024-11-20', 
  usage=Usage(
    prompt_tokens=8703, 
    completion_tokens=41, 
    total_tokens=8744
  ), 
  message=Message(
    content='The symbol on the paper tray that indicates...', 
    role='assistant'
  ), 
  finish_reason='stop', 
  citations=[
    Citation(
      position=209, 
      references=[
        Reference(
            file=FileModel(
              name='document.pdf', 
              id='9c322597-58d6-4ebc-84b5-a398b620da01', 
              metadata=None, 
              created_on='2025-08-28T23:41:41.982805815Z', 
              updated_on='2025-08-28T23:42:09.562949544Z', 
              status='Available', 
              percent_done=1.0, 
              signed_url='https://storage.googleapis.com/...', 
              error_message=None, 
              size=1236044.0, 
              multimodal=True
          ), 
          pages=[3, 4, 5, 6, 7, 8, 9, 10, 11], 
          highlight=None
        )
      ]
    )
  ]
)
If your assistant uses multimodal context snippets to generate a response, no highlights are returned—even when include_highlights is true.

4. Query for context

To query context for a custom RAG workflow, you can retrieve context snippets directly. Then, you can pass these snippets to an LLM as context. To fetch image-related context snippets (as well as text snippets), set the multimodal request parameter to true (default). When multimodal is true, use include_binary_content to specify what image context you’d like to receive: base64 image data and captions (true) or captions only (false).
from pprint import pprint
from pinecone import Pinecone

pc = Pinecone("PINECONE_API_KEY") 
assistant = pc.assistant.Assistant(assistant_name="example-assistant-multimodal")
context_response = assistant.context(
    query="Describe the symbol on the paper tray that indicates the maximum fill level.",
    multimodal=True,
    include_binary_content=True
)

pprint(context_response)
If you set multimodal to true and include_binary_content to false, image objects are not returned in the snippets. If you set multimodal to false, only text snippets are returned.
Response:
# Formatted for readability
ContextResponse(
  id='00000000000000001e3ef84bd493e612', 
  snippets=[
    MultimodalSnippet(
      type='multimodal', 
      content=[
        TextBlock(type='text', text="..."), 
        ImageBlock(
          type='image', 
          caption='...', 
          image=Image(mime_type='image/jpeg', data='...', type='base64')), 
        // ...
      ], 
      score=0.16321887, 
      reference=PdfReference(
        type='pdf', 
        pages=[3, 4, 5, 6, 7, 8, 9, 10, 11], 
        file=FileModel(
          name='document.pdf', 
          id='9c322597-58d6-4ebc-84b5-a398b620da01', 
          metadata=None, 
          created_on='2025-08-28T23:41:41.982805815Z', 
          updated_on='2025-08-28T23:42:09.562949544Z', 
          status='Available', 
          percent_done=1.0, 
          signed_url='https://storage.googleapis.com/...', 
          error_message=None, 
          size=1236044, 
          multimodal=True
        )
      )
    ), 
    // ...
  ], 
  usage=TokenCounts(
    prompt_tokens=7061, 
    completion_tokens=0, 
    total_tokens=7061
  )
)
Snippets are returned based on their semantic relevance to the provided query. When you set multimodal to true, you’ll receive the most relevant snippets, regardless of the types of content they contain. You can receive text snippets, multimodal snippets, or both.

Limitations

  • File type: Only PDF files support multimodal context
  • File size: Maximum 50MB per file
  • Page limit: Maximum 100 pages per file
  • Multimodal PDFs per assistant:
    • Standard and Enterprise plans: 20
    • Starter plan: 1