Skip to main content
Pricing and limits vary based on subscription plan.

Pricing

Pinecone Assistant usage is billed monthly. Costs can include:
  • Minimum usage (Standard and Enterprise plans)
  • Ingestion (knowledge base uploads)
  • Tokens (chat, context retrieval, and evaluation)
  • Storage (monthly per GB on Standard and Enterprise)

Minimum usage

The Standard and Enterprise pricing plans include a monthly minimum usage committment:
PlanMinimum usage
Starter$0/month
Standard$50/month
Enterprise$500/month
Beyond the monthly minimum, customers are charged for what they use each month. Examples
  • You are on the Standard plan.
  • Your usage for the month of August amounts to $20.
  • Your usage is below the $50 monthly minimum, so your total for the month is $50.
In this case, the August invoice would include line items for each service you used (totaling $20), plus a single line item covering the rest of the minimum usage commitment ($30).
  • You are on the Standard plan.
  • Your usage for the month of August amounts to $100.
  • Your usage exceeds the $50 monthly minimum, so your total for the month is $100.
In this case, the August invoice would only show line items for each service you used (totaling $100). Since your usage exceeds the minimum usage commitment, you are only charged for your actual usage and no additional minimum usage line item appears on your invoice.

Ingestion

When you upload or replace files in an assistant’s knowledge base, usage is measured in ingestion units. One ingestion unit is approximately 400 tokens (~300 words); exact counts can vary by document.
Processing pathRate (per ingestion unit)
Standard file ingestion$0.0005
Multimodal PDF processing uses the same ingestion unit; it is billed at about twice the standard per-unit rate. For current rates, see Pricing.
PlanKnowledge base uploads (ingestion units)
Starter1,000 / month included
StandardPay per unit at the rate above
EnterprisePay per unit at the rate above
Multimodal ingestion applies to content processed through the multimodal PDF path. Standard ingestion applies to other supported file types. Usage and invoices reflect a single ingestion usage line item. With API version 2026-04 or later, a completed file-ingestion operation may include ingestion_units. Use Describe an operation or Track file operations for details.

Tokens

For paid plans, you are charged for the number of tokens used by each assistant. Ingestion is billed separately from chat and context retrieval tokens.

Chat tokens

Chatting with an assistant involves both input and output tokens:
  • Input tokens are based on the messages sent to the assistant and the context snippets retrieved from the assistant and sent to a model. Messages sent to the assistant can include messages from the chat history in addition to the newest message.
  • Output tokens are based on the answer from the model.
PlanInput token rateOutput token rate
StarterIncluded (500,000 / month)Included (300,000 / month)
Standard$8/million tokens$15/million tokens
Enterprise$8/million tokens$15/million tokens
Chat input tokens appear as “Assistants Input Tokens” on invoices and prompt_tokens in API responses. Chat output tokens appear as “Assistants Output Tokens” on invoices and completion_tokens in API responses.

Context tokens

When you retrieve context snippets, tokens are based on the messages sent to the assistant and the context snippets retrieved from the assistant. Messages sent to the assistant can include messages from the chat history in addition to the newest message.
PlanToken rate
StarterIncluded (500,000 / month)
Standard$5/million tokens
Enterprise$5/million tokens
Context retrieval tokens appear as Assistants Context Tokens Processed on invoices and prompt_tokens in API responses. In API responses, completion_tokens will always be 0 because, unlike for chat, there is no answer from a model.

Evaluation tokens

Evaluating responses involves both input and output tokens:
  • Input tokens are based on two requests to a model: The first request contains a question, answer, and ground truth answer, and the second request contains the same details plus generated facts returned by the model for the first request.
  • Output tokens are based on two responses from a model: The first response contains generated facts, and the second response contains evaluation metrics.
PlanInput token rateOutput token rate
StarterNot availableNot available
Standard$8/million tokens$15/million tokens
Enterprise$8/million tokens$15/million tokens
Evaluation input tokens appear as Assistants Evaluation Tokens Processed on invoices and prompt_tokens in API responses. Evaluation output tokens appear as Assistants Evaluation Tokens Out on invoices and completion_tokens in API responses.

Storage

For paid plans, you are charged for the size of each assistant.
PlanStorage rate
StarterFree (1 GB max per project)
Standard$3/GB per month
Enterprise$3/GB per month

Limits

Pinecone Assistant limits vary based on subscription plan.

Object limits

Object limits are restrictions on the number or size of assistant-related objects. Limits in this table are per project today; some may move to per organization in the future.
MetricStarter planStandard planEnterprise plan
Assistants per project5UnlimitedUnlimited
File storage per project1 GBUnlimitedUnlimited
Chat input tokens per project500,000 / monthUnlimitedUnlimited
Chat output tokens per project300,000 / monthUnlimitedUnlimited
Context retrieval tokens per project500,000 / monthUnlimitedUnlimited
Ingestion units per project1,000 / monthUnlimitedUnlimited
Evaluation input tokens per projectNot available150,000500,000
Files per assistant100UnlimitedUnlimited
File size (.docx, .json, .md, .txt)10 MB10 MB10 MB
File size (.pdf)10 MB100 MB100 MB
Metadata size per file16 KB16 KB16 KB
Additionally, the following limits apply to multimodal PDFs (currently in public preview): Multimodal PDF processing uses the same ingestion unit as standard uploads; it is billed at about twice the standard per-unit rate (see Pricing and limits). Object and rate limits for assistants also apply—see #limits and #rate-limits.
MetricStarter planStandard planEnterprise plan
Max file size10 MB50 MB50 MB
Page limit100100100
Multimodal PDFs per assistant102020

Rate limits

Rate limits help protect your applications from misuse and maintain the health of our shared infrastructure. These limits are designed to support typical production workloads while ensuring reliable performance for all users. Most rate limits can be adjusted upon request. If you need higher limits to scale your application, contact Support with details about your use case. Requests that exceed a rate limit fail and return a 429 - TOO_MANY_REQUESTS status.
To handle rate limits, implement retry logic with exponential backoff.
MetricStarter planStandard planEnterprise plan
Assistant list/get requests per minute40100500
Assistant create/update requests per minute2050100
Assistant delete requests per minute2050100
File get requests per minute1003006,000
File list requests per minute501503,000
File upload requests per minute520300
Multimodal PDF upload requests per minute520300
File delete requests per minute520300
Chat input tokens per minute100,000300,0001,000,000
Chat history tokens per query64,00064,00064,000