The Standard and Enterprise pricing plans include a monthly minimum usage committment:
Plan
Minimum usage
Starter
$0/month
Standard
$50/month
Enterprise
$500/month
Beyond the monthly minimum, customers are charged for what they use each month.Examples
Usage below monthly minimum
You are on the Standard plan.
Your usage for the month of August amounts to $20.
Your usage is below the $50 monthly minimum, so your total for the month is $50.
In this case, the August invoice would include line items for each service you used (totaling $20), plus a single line item covering the rest of the minimum usage commitment ($30).
Usage exceeds monthly minimum
You are on the Standard plan.
Your usage for the month of August amounts to $100.
Your usage exceeds the $50 monthly minimum, so your total for the month is $100.
In this case, the August invoice would only show line items for each service you used (totaling $100). Since your usage exceeds the minimum usage commitment, you are only charged for your actual usage and no additional minimum usage line item appears on your invoice.
When you upload or replace files in an assistant’s knowledge base, usage is measured in ingestion units. One ingestion unit is approximately 400 tokens (~300 words); exact counts can vary by document.
Processing path
Rate (per ingestion unit)
Standard file ingestion
$0.0005
Multimodal PDF processing uses the same ingestion unit; it is billed at about twice the standard per-unit rate. For current rates, see Pricing.
Plan
Knowledge base uploads (ingestion units)
Starter
1,000 / month included
Standard
Pay per unit at the rate above
Enterprise
Pay per unit at the rate above
Multimodal ingestion applies to content processed through the multimodal PDF path. Standard ingestion applies to other supported file types.Usage and invoices reflect a single ingestion usage line item. With API version2026-04 or later, a completed file-ingestion operation may include ingestion_units. Use Describe an operation or Track file operations for details.
For paid plans, you are charged for the number of tokens used by each assistant. Ingestion is billed separately from chat and context retrieval tokens.
Input tokens are based on the messages sent to the assistant and the context snippets retrieved from the assistant and sent to a model. Messages sent to the assistant can include messages from the chat history in addition to the newest message.
Output tokens are based on the answer from the model.
Plan
Input token rate
Output token rate
Starter
Included (500,000 / month)
Included (300,000 / month)
Standard
$8/million tokens
$15/million tokens
Enterprise
$8/million tokens
$15/million tokens
Chat input tokens appear as “Assistants Input Tokens” on invoices and prompt_tokens in API responses. Chat output tokens appear as “Assistants Output Tokens” on invoices and completion_tokens in API responses.
When you retrieve context snippets, tokens are based on the messages sent to the assistant and the context snippets retrieved from the assistant. Messages sent to the assistant can include messages from the chat history in addition to the newest message.
Plan
Token rate
Starter
Included (500,000 / month)
Standard
$5/million tokens
Enterprise
$5/million tokens
Context retrieval tokens appear as Assistants Context Tokens Processed on invoices and prompt_tokens in API responses. In API responses, completion_tokens will always be 0 because, unlike for chat, there is no answer from a model.
Input tokens are based on two requests to a model: The first request contains a question, answer, and ground truth answer, and the second request contains the same details plus generated facts returned by the model for the first request.
Output tokens are based on two responses from a model: The first response contains generated facts, and the second response contains evaluation metrics.
Plan
Input token rate
Output token rate
Starter
Not available
Not available
Standard
$8/million tokens
$15/million tokens
Enterprise
$8/million tokens
$15/million tokens
Evaluation input tokens appear as Assistants Evaluation Tokens Processed on invoices and prompt_tokens in API responses. Evaluation output tokens appear as Assistants Evaluation Tokens Out on invoices and completion_tokens in API responses.
Object limits are restrictions on the number or size of assistant-related objects. Limits in this table are per project today; some may move to per organization in the future.
Metric
Starter plan
Standard plan
Enterprise plan
Assistants per project
5
Unlimited
Unlimited
File storage per project
1 GB
Unlimited
Unlimited
Chat input tokens per project
500,000 / month
Unlimited
Unlimited
Chat output tokens per project
300,000 / month
Unlimited
Unlimited
Context retrieval tokens per project
500,000 / month
Unlimited
Unlimited
Ingestion units per project
1,000 / month
Unlimited
Unlimited
Evaluation input tokens per project
Not available
150,000
500,000
Files per assistant
100
Unlimited
Unlimited
File size (.docx, .json, .md, .txt)
10 MB
10 MB
10 MB
File size (.pdf)
10 MB
100 MB
100 MB
Metadata size per file
16 KB
16 KB
16 KB
Additionally, the following limits apply to multimodal PDFs (currently in public preview):Multimodal PDF processing uses the same ingestion unit as standard uploads; it is billed at about twice the standard per-unit rate (see Pricing and limits). Object and rate limits for assistants also apply—see #limits and #rate-limits.
Rate limits help protect your applications from misuse and maintain the health of our shared infrastructure. These limits are designed to support typical production workloads while ensuring reliable performance for all users.Most rate limits can be adjusted upon request. If you need higher limits to scale your application, contact Support with details about your use case.Requests that exceed a rate limit fail and return a 429 - TOO_MANY_REQUESTS status.