Pinecone Assistant is a service that allows you to upload documents, ask questions, and receive responses that reference your documents. This is known as retrieval-augmented generation (RAG).

Use cases

Pinecone Assistant is useful for a variety of tasks, especially for the following:

  • Prototyping and deploying an AI assistant quickly.
  • Providing context-aware answers about your proprietary data without training an LLM.
  • Retrieving answers grounded in your data, with references.

Workflow

You can use the Pinecone Assistant through the Pinecone console or Pinecone API.

The following steps outline the general Pinecone Assistant workflow:

1

Create an assistant

Create an assistant to answer questions about your documents.

2

Upload documents

Upload documents to your assistant. Your assistant manages chunking, embedding, and storage for you.

3

Chat with an assistant

Chat with your assistant and receive responses as a JSON object or as a text stream. For each chat, your assistant queries a large language model (LLM) with context from your documents to ensure the LLM provides grounded responses.

4

Evaluate answers

Evaluate the assistant’s responses for correctness and completeness.

5

Optimize performance

Use custom instructions to tailor your assistant’s behavior and responses to specific use cases or requirements. Filter by metadata associated with files to reduce latency and improve the accuracy of responses.

6

Retrieve context snippets

Retrieve context snippets to understand what relevant data snippets Pinecone Assistant is using to generate responses. You can use the retrieved snippets with your own LLM, RAG application, or agentic workflow.

For information on how the Pinecone Assistant works, see Assistant architecture.

SDK support

You can use the Assistant API directly or via the Pinecone Python SDK.

To interact with Pinecone Assistant using the Python SDK, upgrade the client and install the pinecone-plugin-assistant package as follows:

HTTP
pip install --upgrade pinecone pinecone-plugin-assistant

Support for the Node.js SDK is coming soon.

Files in Pinecone Assistant

Upload files to provide your assistant with context and information to reference when generating responses.

Supported size and types

The maximum file size is 100MB.

Pinecone Assistant supports the following file types:

  • JSON (.json)
  • Markdown (.md)
  • Text (.txt)
  • PDF (.pdf)

Scanned PDFs and text extraction from images (OCR) are not supported.

If a document contains images, the images are not processed, and the assistant generates responses based on the text content only.

File storage

Files are uploaded to Google Cloud Storage (us-central1 region) and to your organization’s Pinecone vector database. The assistant processes the files, so data is not sent outside of blob storage or Pinecone. A signed URL for the file is generated and stored in the assistant’s details, so the assistant can retrieve the file when generating responses. To view the signed URL, you can list the files in the assistant.

File metadata

You can upload a file with metadata, which allows you to store additional information about the file as key-value pairs.

File metadata can be set only when the file is uploaded. You cannot update metadata after the file is uploaded.

File metadata can be used for the following purposes:

  • Filtering chat responses: Specify filters on assistant responses so only files that match the metadata filter are referenced in the response. Chat requests without metadata filters do not consider metadata.
  • Viewing a filtered list of files: Use metadata filters to list files in an assistant that match specific criteria.

Supported metadata size and types

Pinecone Assistant supports 40KB of metadata per file.

Metadata payloads must be key-value pairs in a JSON object. Keys must be strings, and values can be one of the following data types:

  • String
  • Number (integer or floating point, gets converted to a 64 bit floating point)
  • Booleans (true, false)
  • List of strings

Null metadata values are not supported. Instead of setting a key to hold a
null value, we recommend you remove that key from the metadata payload.

For example, the following would be valid metadata payloads:

JSON
{
    "genre": "action",
    "year": 2020,
    "length_hrs": 1.5
}

{
    "color": "blue",
    "fit": "straight",
    "price": 29.99,
    "is_jeans": true
}

Metadata query language

Pinecone’s filtering query language is based on MongoDB’s query and projection operators. Pinecone currently supports a subset of those selectors:

FilterDescriptionSupported types
$eqMatches with metadata values that are equal to a specified value. Example: {"genre": {"$eq": "documentary"}}Number, string, boolean
$neMatches with metadata values that are not equal to a specified value. Example: {"genre": {"$ne": "drama"}}Number, string, boolean
$gtMatches with metadata values that are greater than a specified value. Example: {"year": {"$gt": 2019}}Number
$gteMatches with metadata values that are greater than or equal to a specified value. Example:{"year": {"$gte": 2020}}Number
$ltMatches with metadata values that are less than a specified value. Example: {"year": {"$lt": 2020}}Number
$lteMatches with metadata values that are less than or equal to a specified value. Example: {"year": {"$lte": 2020}}Number
$inMatches with metadata values that are in a specified array. Example: {"genre": {"$in": ["comedy", "documentary"]}}String, number
$ninMatches with metadata values that are not in a specified array. Example: {"genre": {"$nin": ["comedy", "documentary"]}}String, number
$existsMatches with the specified metadata field. Example: {"genre": {"$exists": true}}Boolean
$andJoins query clauses with a logical AND. Example: {"$and": [{"genre": {"$eq": "drama"}}, {"year": {"$gte": 2020}}]}-
$orJoins query clauses with a logical OR. Example: {"$or": [{"genre": {"$eq": "drama"}}, {"year": {"$gte": 2020}}]}-

For example, the following has a "genre" metadata field with a list of strings:

JSON
{ "genre": ["comedy", "documentary"] }

This means "genre" takes on both values, and requests with the following filters will match:

JSON
{"genre":"comedy"}

{"genre": {"$in":["documentary","action"]}}

{"$and": [{"genre": "comedy"}, {"genre":"documentary"}]}

However, requests with the following filter will not match:

JSON
{ "$and": [{ "genre": "comedy" }, { "genre": "drama" }] }

Additionally, requests with the following filters will not match because they are invalid. They will result in a compilation error:

# INVALID QUERY:
{"genre": ["comedy", "documentary"]}
# INVALID QUERY:
{"genre": {"$eq": ["comedy", "documentary"]}}

Limitations

The following Pinecone Assistant limit apply to each organization and vary based on pricing plan:

MetricStarter planStandard planEnterprise plan
Max number of assistants3UnlimitedUnlimited
Max tokens per minute (TPM) input30,000150,000150,000
Max number of total LLM processed tokens1,500,000UnlimitedUnlimited
Max input tokens per query64,00064,00064,000
Max total output tokens200,000UnlimitedUnlimited

The following file limits apply to each assistant and vary based on pricing plan:

Starter planStandard planEnterprise plan
Max file size100MB100MB100MB
Max file storage1GB10GB10GB
Max files uploaded1010,00010,000

Pricing

Each active assistant has a fee of $0.20 per day, which is billed hourly at $0.008333333.

See Pricing for up-to-date pricing information.

Token usage

Pinecone Assistant usage is measured in tokens, with different counts and cost for input and output tokens.

Pinecone Assistant consumes input tokens for both planning and retrieval. Input token usage is calculated based on the chat history, the document structure and data density (e.g., how many words are in a page), and the number of documents that meet the filter criteria. This means that, in general, the total number of input tokens used is the sum of the chat history token count plus in the order of 10,000 tokens used for document retrieval. The maximum input tokens per query is 64,000.

Output tokens are the number of tokens generated as part of the answer generation. The total number depends on the complexity of the question and the number of documents that were retrieved and are relevant for the question. The output typically ranges from a few dozen to several hundred tokens.

Learn more