This page describes the pricing and limits of Pinecone Assistant. Pricing and limits vary based on subscription plan.

Pricing

The cost of using Pinecone Assistant is determined by the following factors:
  • Monthly usage
  • Hourly rate
  • Tokens used
  • Storage

Minimum usage

The Standard and Enterprise pricing plans include a monthly minimum usage committment:
PlanMinimum usage
Starter$0/month
Standard$50/month
Enterprise$500/month
Beyond the monthly minimum, customers are charged for what they use each month. Examples

Hourly rate

For paid plans, you are charged an hourly rate for each assistant, regardless of assistant activity.
PlanHourly rate
StarterFree
Standard$0.05/hour
Enterprise$0.05/hour

Tokens

For paid plans, you are charged for the number of tokens used by each assistant.

Chat tokens

Chatting with an assistant involves both input and output tokens:
  • Input tokens are based on the messages sent to the assistant and the context snippets retrieved from the assistant and sent to a model. Messages sent to the assistant can include messages from the chat history in addition to the newest message.
  • Output tokens are based on the answer from the model.
PlanInput token rateOutput token rate
StarterFree (1.5M max per project)Free (200k max per project)
Standard$8/million tokens$15/million tokens
Enterprise$8/million tokens$15/million tokens
Chat input tokens appear as “Assistants Input Tokens” on invoices and prompt_tokens in API responses. Chat output tokens appear as “Assistants Output Tokens"" on invoices and completion_tokens in API responses.

Context tokens

When you retrieve context snippets, tokens are based on the messages sent to the assistant and the context snippets retrieved from the assistant. Messages sent to the assistant can include messages from the chat history in addition to the newest message.
PlanToken rate
StarterFree (500k max per project)
Standard$5/million tokens
Enterprise$8/million tokens
Context retrieval tokens appear as Assistants Context Tokens Processed on invoices and prompt_tokens in API responses. In API responses, completion_tokens will always be 0 because, unlike for chat, there is no answer from a model.

Evaluation tokens

Evaluating responses involves both input and output tokens:
  • Input tokens are based on two requests to a model: The first request contains a question, answer, and ground truth answer, and the second request contains the same details plus generated facts returned by the model for the first request.
  • Output tokens are based on two responses from a model: The first response contains generated facts, and the second response contains evaluation metrics.
PlanInput token rateOutput token rate
StarterNot availableNot available
Standard$8/million tokens$15/million tokens
Enterprise$8/million tokens$15/million tokens
Evaluation input tokens appear as Assistants Evaluation Tokens Processed on invoices and prompt_tokens in API responses. Evalulation output tokens appear as as Assistants Evaluation Tokens Out on invoices and completion_tokens in API responses.

Storage

For paid plans, you are charged for the size of each assistant.
PlanStorage rate
StarterFree (1 GB max per project)
Standard$3/GB per month
Enterprise$3/GB per month

Limits

Pinecone Assistant limits vary based on subscription plan.

Object limits

Object limits are restrictions on the number or size of assistant-related objects.
MetricStarter planStandard planEnterprise plan
Assistants per project5UnlimitedUnlimited
File storage per project1 GBUnlimitedUnlimited
Chat input tokens per project1,500,000UnlimitedUnlimited
Chat output tokens per project200,000UnlimitedUnlimited
Context retrieval tokens per project500,000UnlimitedUnlimited
Evaluation input tokens per projectNot available150,000500,000
Files per assistant1010,00010,000
File size (.docx, .json, .md, .txt)10 MB10 MB10 MB
File size (.pdf)10 MB100 MB100 MB
Metadata size per file16 KB16 KB16 KB
Additionally, the following limits apply to multimodal PDFs (currently in public preview):
MetricStarter planStandard planEnterprise plan
Max file size10 MB50 MB50 MB
Page limit100100100
Multimodal PDFs per assistant12020

Rate limits

Rate limits are restrictions on the frequency of requests within a specified period of time. Requests that exceed a rate limit fail and return a 429 - TOO_MANY_REQUESTS status.
To handle rate limits, implement retry logic with exponential backoff.
MetricStarter planStandard planEnterprise plan
Assistant list/get requests per minute40100500
Assistant create/update requests per minute2050100
Assistant delete requests per minute2050100
File list/get requests per minute1003006000
File upload requests per minute520300
File delete requests per minute520300
Chat input tokens per minute100,000300,0001,000,000
Chat history tokens per query64,00064,00064,000