Understanding Pinecone Assistant
Pinecone Assistant is a service that allows you to upload documents, ask questions, and receive responses that reference your documents. This is known as retrieval-augmented generation (RAG).
Use cases
Pinecone Assistant is useful for a variety of tasks, especially for the following:
- Prototyping and deploying an AI assistant quickly.
- Providing context-aware answers about your proprietary data without training an LLM.
- Retrieving answers grounded in your data, with references.
Workflow
You can use the Pinecone Assistant through the Pinecone console or Pinecone API.
The following steps outline the general Pinecone Assistant workflow:
Create an assistant
Create an assistant to answer questions about your documents.
Upload documents
Upload documents to your assistant. Your assistant manages chunking, embedding, and storage for you.
Chat with an assistant
Chat with your assistant and receive responses as a JSON object or as a text stream. For each chat, your assistant queries a large language model (LLM) with context from your documents to ensure the LLM provides grounded responses.
Evaluate answers
Use the metrics_alignment
operation to measure the correctness and completeness of responses from your assistant.
Optimize performance
Use custom instructions to tailor your assistant’s behavior and responses to specific use cases or requirements. Filter by metadata associated with files to reduce latency and improve the accuracy of responses.
Retrieve context snippets
Retrieve context snippets to understand what relevant data snippets Pinecone Assistant is using to generate responses. You can use the retrieved snippets with your own LLM, RAG application, or agentic workflow.
For information on how the Pinecone Assistant works, see Assistant architecture.
SDK support
You can use the Assistant API directly or via the Pinecone Python SDK.
To interact with Pinecone Assistant using the Python SDK, upgrade the client and install the pinecone-plugin-assistant
package as follows:
Support for the Node.js SDK is coming soon.
Files in Pinecone Assistant
Upload files to provide your assistant with context and information to reference when generating responses.
Supported size and types
The maximum file size is 100MB.
Pinecone Assistant supports the following file types:
- JSON (.json)
- Markdown (.md)
- Text (.txt)
- PDF (.pdf)
Scanned PDFs and text extraction from images (OCR) are not supported.
If a document contains images, the images are not processed, and the assistant generates responses based on the text content only.
File storage
Files are uploaded to Google Cloud Storage (us-central1
region) and to your organization’s Pinecone vector database. The assistant processes the files, so data is not sent outside of blob storage or Pinecone. A signed URL for the file is generated and stored in the assistant’s details, so the assistant can retrieve the file when generating responses. To view the signed URL, you can list the files in the assistant.
File metadata
You can upload a file with metadata, which allows you to store additional information about the file as key-value pairs.
File metadata can be set only when the file is uploaded. You cannot update metadata after the file is uploaded.
File metadata can be used for the following purposes:
- Filtering chat responses: Specify filters on assistant responses so only files that match the metadata filter are referenced in the response. Chat requests without metadata filters do not consider metadata.
- Viewing a filtered list of files : Use metadata filters to list files in an assistantthat match specific criteria.
Supported metadata size and types
Pinecone Assistant supports 40KB of metadata per file.
Metadata payloads must be key-value pairs in a JSON object. Keys must be strings, and values can be one of the following data types:
- String
- Number (integer or floating point, gets converted to a 64 bit floating point)
- Booleans (true, false)
- List of strings
Null metadata values are not supported. Instead of setting a key to hold a
null value, we recommend you remove that key from the metadata payload.
For example, the following would be valid metadata payloads:
Metadata query language
Pinecone’s filtering query language is based on MongoDB’s query and projection operators. Pinecone currently supports a subset of those selectors:
Filter | Description | Supported types |
---|---|---|
$eq | Matches with metadata values that are equal to a specified value. | Number, string, boolean |
$ne | Matches with metadata values that are not equal to a specified value. | Number, string, boolean |
$gt | Matches with metadata values that are greater than a specified value. | Number |
$gte | Matches with metadata values that are greater than or equal to a specified value. | Number |
$lt | Matches with metadata values that are less than a specified value. | Number |
$lte | Matches with metadata values that are less than or equal to a specified value. | Number |
$in | Matches with metadata values that are in a specified array. | String, number |
$nin | Matches with metadata values that are not in a specified array. | String, number |
$exists | Matches with the specified metadata field. | Boolean |
For example, the following has a "genre"
metadata field with a list of strings:
This means "genre"
takes on both values, and requests with the following filters will match:
However, requests with the following filter will not match:
Additionally, requests with the following filters will not match because they are invalid. They will result in a compilation error:
Limitations
The following Pinecone Assistant limit apply to each organization and vary based on pricing plan:
Metric | Starter plan | Standard plan | Enterprise plan |
---|---|---|---|
Max number of assistants | 3 | Unlimited | Unlimited |
Max tokens per minute (TPM) input | 30,000 | 150,000 | 150,000 |
Max number of total LLM processed tokens | 1,500,000 | Unlimited | Unlimited |
Max input tokens per query | 64,000 | 64,000 | 64,000 |
Max total output tokens | 200,000 | Unlimited | Unlimited |
The following file limits apply to each assistant and vary based on pricing plan:
Starter plan | Standard plan | Enterprise plan | |
---|---|---|---|
Max file storage | 1GB | 10GB | 10GB |
Max files uploaded | 10 | 10,000 | 10,000 |
Pricing
See Pricing for up-to-date pricing information.
Token usage
Pinecone Assistant usage is measured in tokens, with different counts and cost for input and output tokens.
Pinecone Assistant consumes input tokens for both planning and retrieval. Input token usage is calculated based on the chat history, the document structure and data density (e.g., how many words are in a page), and the number of documents that meet the filter criteria. This means that, in general, the total number of input tokens used is the sum of the chat history token count plus in the order of 10,000 tokens used for document retrieval. The maximum input tokens per query is 64,000.
Output tokens are the number of tokens generated as part of the answer generation. The total number depends on the complexity of the question and the number of documents that were retrieved and are relevant for the question. The output typically ranges from a few dozen to several hundred tokens.
Learn more
Was this page helpful?