Pinecone Assistant is a service that allows you to upload documents, ask questions, and receive responses that reference your documents. This is known as retrieval-augmented generation (RAG). You can access assistant using the Pinecone console, a Python plugin, or the Assistant API. The JavaScript and Java SDKs do not support Pinecone Assistant.

This feature is in public preview.

How it works

When you upload a document, your assistant processes the contents by chunking and embedding the text. Then, the assistant stores the embeddings in a vector database. When you chat with your assistant, it queries a large language model (LLM) with your prompt and any relevant information from your data sources. With this context, the LLM can provide responses grounded in your documents.

Assistant manages embedding generation and storage and prompting the LLM: you do not directly access these parts of the system. You upload the files and chat with the model, and Assistant manages all other components.

SDK support

You can use the Assistant API directly or via the Pinecone Python SDK.

To interact with Pinecone Assistant using the Python SDK, upgrade the client and install the pinecone-plugin-assistant package as follows:

HTTP
pip install --upgrade pinecone pinecone-plugin-assistant

Limitations

Pinecone Assistant has the following limitations:

  • Supported file types: .txt and .pdf
  • Max input tokens per query: 64,000

Starter plans

The following limitations apply to each Starter organization:

  • Max number of assistants: 3
  • Max tokens per minute (TPM) input: 30,000
  • Max number of total LLM processed tokens: 1,500,000
  • Max total output tokens: 200,000

The following limitations apply to each assistant in Starter organizations:

  • Max file storage: 1GB
  • Max files uploaded: 10

Standard and Enterprise plans

The following limitations apply to each Standard or Enterprise organization:

  • Max number of assistants: unlimited
  • Max tokens per minute (TPM) input: 150,000
  • Max number of total LLM processed tokens: unlimited
  • Max total output tokens: unlimited

The following limitations apply to each assistant in Standard or Enterprise organizations:

  • Max file storage: 10GB
  • Max files uploaded: 10,000

Pricing

See Pricing for up-to-date pricing information.