Skip to main content

Documentation Index

Fetch the complete documentation index at: https://docs.pinecone.io/llms.txt

Use this file to discover all available pages before exploring further.

Pricing and limits vary based on subscription plan.

Pricing

Pinecone Assistant usage is billed monthly. Costs can include:

Minimum usage

The Builder, Standard, and Enterprise pricing plans include a monthly minimum usage commitment:
PlanMinimum usage
Starter$0/month
Builder$20/month (flat)
Standard$50/month
Enterprise$500/month
On the Builder plan, the monthly minimum is a flat fee that covers included usage; additional usage beyond Builder limits is blocked rather than billed. On the Standard and Enterprise plans, customers are charged for what they use each month beyond the monthly minimum. Examples
  • You are on the Standard plan.
  • Your usage for the month of August amounts to $20.
  • Your usage is below the $50 monthly minimum, so your total for the month is $50.
In this case, the August invoice would include line items for each service you used (totaling $20), plus a single line item covering the rest of the minimum usage commitment ($30).
  • You are on the Standard plan.
  • Your usage for the month of August amounts to $100.
  • Your usage exceeds the $50 monthly minimum, so your total for the month is $100.
In this case, the August invoice would only show line items for each service you used (totaling $100). Since your usage exceeds the minimum usage commitment, you are only charged for your actual usage and no additional minimum usage line item appears on your invoice.

Ingestion

When you upload or replace files for an assistant, usage is measured in ingestion units. One ingestion unit is approximately 400 tokens (~300 words); exact counts can vary by document.
Processing pathRate (per ingestion unit)
Standard file ingestion$0.0005
Multimodal PDF processing uses the same ingestion unit; it is billed at about twice the standard per-unit rate. For current rates, see Pricing.
PlanFile uploads (ingestion units)
Starter1,000 / month included
Builder10,000 / month included
StandardPay per unit at the rate above
EnterprisePay per unit at the rate above
Multimodal ingestion applies to content processed through the multimodal PDF path. Standard ingestion applies to other supported file types. Usage and invoices reflect a single ingestion usage line item. With API version 2026-04 or later, a completed file-ingestion operation may include ingestion_units. Use Describe an operation or Track file operations for details.

Tokens

For paid plans, you are charged for the number of tokens used by each assistant. Ingestion is billed separately from chat and context retrieval tokens.

Chat tokens

Chatting with an assistant involves both input and output tokens:
  • Input tokens are based on the messages sent to the assistant and the context snippets retrieved from the assistant and sent to a model. Messages sent to the assistant can include messages from the chat history in addition to the newest message.
  • Output tokens are based on the answer from the model.
PlanInput token rateOutput token rate
StarterIncluded (500,000 / month*)Included (300,000 / month)
BuilderIncluded (2,000,000 / month)Included (1,000,000 / month)
Standard$8/million tokens$15/million tokens
Enterprise$8/million tokens$15/million tokens
*1,000,000 input tokens/month to explore Marketplace apps until June 30, 2026.
Chat input tokens appear as “Assistants Input Tokens” on invoices and prompt_tokens in API responses. Chat output tokens appear as “Assistants Output Tokens” on invoices and completion_tokens in API responses.

Context tokens

When you retrieve context snippets, tokens are based on the messages sent to the assistant and the context snippets retrieved from the assistant. Messages sent to the assistant can include messages from the chat history in addition to the newest message.
PlanToken rate
StarterIncluded (500,000 / month)
BuilderIncluded (2,000,000 / month)
Standard$5/million tokens
Enterprise$5/million tokens
Context retrieval tokens appear as Assistants Context Tokens Processed on invoices and prompt_tokens in API responses. In API responses, completion_tokens will always be 0 because, unlike for chat, there is no answer from a model.

Evaluation tokens

Evaluating responses involves both input and output tokens:
  • Input tokens are based on two requests to a model: The first request contains a question, answer, and ground truth answer, and the second request contains the same details plus generated facts returned by the model for the first request.
  • Output tokens are based on two responses from a model: The first response contains generated facts, and the second response contains evaluation metrics.
PlanInput token rateOutput token rate
StarterNot availableNot available
BuilderNot availableNot available
Standard$8/million tokens$15/million tokens
Enterprise$8/million tokens$15/million tokens
Evaluation input tokens appear as Assistants Evaluation Tokens Processed on invoices and prompt_tokens in API responses. Evaluation output tokens appear as Assistants Evaluation Tokens Out on invoices and completion_tokens in API responses.

Storage

For paid plans, you are charged for the size of each assistant.
PlanStorage rate
StarterFree (1 GB max per org)
BuilderFree up to 3 GB per org
Standard$3/GB per month
Enterprise$3/GB per month

Limits

Pinecone Assistant limits vary based on subscription plan.

Object limits

Object limits are restrictions on the number or size of assistant-related objects. Limits below are scoped per organization except for Assistants per project, which is scoped per project.
MetricStarter planBuilder planStandard planEnterprise plan
Assistants per project5200UnlimitedUnlimited
File storage per org1 GB3 GBUnlimitedUnlimited
Chat input tokens per org500,000 / month*2,000,000 / monthUnlimitedUnlimited
Chat output tokens per org300,000 / month1,000,000 / monthUnlimitedUnlimited
Context retrieval tokens per org500,000 / month2,000,000 / monthUnlimitedUnlimited
Ingestion units per org1,000 / month10,000 / monthUnlimitedUnlimited
File size (.docx, .json, .md, .txt)10 MB10 MB10 MB10 MB
File size (.pdf)10 MB50 MB100 MB100 MB
Metadata size per file16 KB16 KB16 KB16 KB
*1,000,000 input tokens/month to explore Marketplace apps until June 30, 2026. Additionally, the following limits apply to multimodal PDFs (currently in public preview): Multimodal PDF processing uses the same ingestion unit as standard uploads; it is billed at about twice the standard per-unit rate (see Pricing and limits). Object and rate limits for assistants also apply—see #limits and #rate-limits.
MetricStarter planBuilder planStandard planEnterprise plan
Max file size10 MB10 MB50 MB50 MB
Page limit100100100100

Rate limits

Rate limits help protect your applications from misuse and maintain the health of our shared infrastructure. These limits are designed to support typical production workloads while ensuring reliable performance for all users. Most rate limits can be adjusted upon request. If you need higher limits to scale your application, contact Support with details about your use case. Requests that exceed a rate limit fail and return a 429 - TOO_MANY_REQUESTS status.
To handle rate limits, implement retry logic with exponential backoff.
MetricStarter planBuilder planStandard planEnterprise plan
Assistant list/get requests per minute4050100500
Assistant create/update requests per minute202550100
Assistant delete requests per minute202550100
File get requests per minute1001503006,000
File list requests per minute50751503,000
File upload requests per minute51520300
Multimodal PDF upload requests per minute5102040
File delete requests per minute51520300
Chat input tokens per minute100,000200,000300,0001,000,000
Chat history tokens per query64,00064,00064,00064,000
Evaluation input tokens per minuteNot availableNot available150,000500,000