Pinecone Assistant is a service that allows you to upload documents, ask questions, and receive responses that reference your documents. This is known as retrieval-augmented generation (RAG). You can access assistant using the Pinecone console, a Python plugin, or the Assistant API. The JavaScript and Java SDKs do not support Pinecone Assistant.

How it works

When you upload a document, your assistant processes the contents by chunking and embedding the text. Then, the assistant stores the embeddings in a vector database. When you chat with your assistant, it queries a large language model (LLM) with your prompt and any relevant information from your data sources. With this context, the LLM can provide responses grounded in your documents.

Assistant manages embedding generation and storage and prompting the LLM: you do not directly access these parts of the system. You upload the files and chat with the model, and Assistant manages all other components.

SDK support

You can use the Assistant API directly or via the Pinecone Python SDK.

To interact with Pinecone Assistant using the Python SDK, upgrade the client and install the pinecone-plugin-assistant package as follows:

HTTP
pip install --upgrade pinecone pinecone-plugin-assistant

Filter with metadata

When you upload a file, you can attach metadata key-value pairs to store additional information about the file. Then, you can specify filters on assistant responses so only files that match the metadata filter are referenced in the response. Chat requests without metadata filters do not consider metadata.

Supported metadata types

Metadata payloads must be key-value pairs in a JSON object. Keys must be strings, and values can be one of the following data types:

  • String
  • Number (integer or floating point, gets converted to a 64 bit floating point)
  • Booleans (true, false)
  • List of strings

Null metadata values are not supported. Instead of setting a key to hold a
null value, we recommend you remove that key from the metadata payload.

For example, the following would be valid metadata payloads:

JSON
{
    "genre": "action",
    "year": 2020,
    "length_hrs": 1.5
}

{
    "color": "blue",
    "fit": "straight",
    "price": 29.99,
    "is_jeans": true
}

Supported metadata size

Pinecone Assistant supports 40KB of metadata per file.

Metadata query language

Pinecone’s filtering query language is based on MongoDB’s query and projection operators. Pinecone currently supports a subset of those selectors:

FilterDescriptionSupported types
$eqMatches with metadata values that are equal to a specified value.Number, string, boolean
$neMatches with metadata values that are not equal to a specified value.Number, string, boolean
$gtMatches with metadata values that are greater than a specified value.Number
$gteMatches with metadata values that are greater than or equal to a specified value.Number
$ltMatches with metadata values that are less than a specified value.Number
$lteMatches with metadata values that are less than or equal to a specified value.Number
$inMatches with metadata values that are in a specified array.String, number
$ninMatches with metadata values that are not in a specified array.String, number
$existsMatches with the specified metadata field.Boolean

For example, the following has a "genre" metadata field with a list of strings:

JSON
{ "genre": ["comedy", "documentary"] }

This means "genre" takes on both values, and requests with the following filters will match:

JSON
{"genre":"comedy"}

{"genre": {"$in":["documentary","action"]}}

{"$and": [{"genre": "comedy"}, {"genre":"documentary"}]}

However, requests with the following filter will not match:

JSON
{ "$and": [{ "genre": "comedy" }, { "genre": "drama" }] }

Additionally, requests with the following filters will not match because they are invalid. They will result in a compilation error:

# INVALID QUERY:
{"genre": ["comedy", "documentary"]}
# INVALID QUERY:
{"genre": {"$eq": ["comedy", "documentary"]}}

Limitations

Pinecone Assistant has the following limitations:

  • Supported file types: .txt and .pdf
  • Max input tokens per query: 64,000

Starter plans

The following limitations apply to each Starter organization:

  • Max number of assistants: 3
  • Max tokens per minute (TPM) input: 30,000
  • Max number of total LLM processed tokens: 1,500,000
  • Max total output tokens: 200,000

The following limitations apply to each assistant in Starter organizations:

  • Max file storage: 1GB
  • Max files uploaded: 10

Standard and Enterprise plans

The following limitations apply to each Standard or Enterprise organization:

  • Max number of assistants: unlimited
  • Max tokens per minute (TPM) input: 150,000
  • Max number of total LLM processed tokens: unlimited
  • Max total output tokens: unlimited

The following limitations apply to each assistant in Standard or Enterprise organizations:

  • Max file storage: 10GB
  • Max files uploaded: 10,000

Pricing

See Pricing for up-to-date pricing information.

Understanding token usage

Pinecone Assistant usage is measured in tokens, with different counts and cost for input and output tokens.

Pinecone Assistant consumes input tokens for both planning and retrieval. Input token usage is calculated based on the chat history, the document structure and data density (e.g., how many words are in a page), and the number of documents that meet the filter criteria. This means that, in general, the total number of input tokens used is the sum of the chat history token count plus in the order of 10,000 tokens used for document retrieval.

Output tokens are the number of tokens generated as part of the answer generation. The total number depends on the complexity of the question and the number of documents that were retrieved and are relevant for the question. The output typically ranges from a few dozen to several hundred tokens.