This feature is in public beta and is not recommended for production usage. Join the beta waitlist and review the preview terms for more details.

Pinecone Assistant is a service that allows you to upload documents, ask questions, and receive responses that reference your documents. This is known as retrieval-augmented generation (RAG). You can access assistant using the Pinecone console, a Python plugin, or the Assistant API. The JavaScript and Java clients do not support this beta release.

How it works

When you upload a document, your assistant processes the contents by chunking and embedding the text. Then, the assistant stores the embeddings in a vector database. When you chat with your assistant, it queries a large language model (LLM) with your prompt and any relevant information from your data sources. With this context, the LLM can provide responses grounded in your documents.

Assistant manages embedding generation and storage and prompting the LLM: you do not directly access these parts of the system. You upload the files and chat with the model, and Assistant manages all other components.

Client support

You can use the Assistant API directly or via the Pinecone Python client.

To use the Assistant API with the Python client, upgrade the client and install the pinecone-plugin-assistant package as follows:

pip install --upgrade pinecone-client pinecone-plugin-assistant


During the Beta Release period, Pinecone Assistant has the following per project limitations:

  • Supported file types: .txt and .pdf
  • Max file storage: 1GB
  • Max files uploaded: 1000
  • Max number of queries: 200
  • Max input tokens per query: 64,000

If you reach any of these limits, you can request additional free quota. Full pricing details are coming soon.