The Standard and Enterprise pricing plans include a monthly minimum usage committment:
Plan
Minimum usage
Starter
$0/month
Standard
$50/month
Enterprise
$500/month
Beyond the monthly minimum, customers are charged for what they use each month.Examples
Usage below monthly minimum
You are on the Standard plan.
Your usage for the month of August amounts to $20.
Your usage is below the $50 monthly minimum, so your total for the month is $50.
In this case, the August invoice would include line items for each service you used (totaling $20), plus a single line item covering the rest of the minimum usage commitment ($30).
Usage exceeds monthly minimum
You are on the Standard plan.
Your usage for the month of August amounts to $100.
Your usage exceeds the $50 monthly minimum, so your total for the month is $100.
In this case, the August invoice would only show line items for each service you used (totaling $100). Since your usage exceeds the minimum usage commitment, you are only charged for your actual usage and no additional minimum usage line item appears on your invoice.
Input tokens are based on the messages sent to the assistant and the context snippets retrieved from the assistant and sent to a model. Messages sent to the assistant can include messages from the chat history in addition to the newest message.
Output tokens are based on the answer from the model.
Plan
Input token rate
Output token rate
Starter
Free (1.5M max per project)
Free (200k max per project)
Standard
$8/million tokens
$15/million tokens
Enterprise
$8/million tokens
$15/million tokens
Chat input tokens appear as “Assistants Input Tokens” on invoices and prompt_tokens in API responses. Chat output tokens appear as “Assistants Output Tokens"" on invoices and completion_tokens in API responses.
When you retrieve context snippets, tokens are based on the messages sent to the assistant and the context snippets retrieved from the assistant. Messages sent to the assistant can include messages from the chat history in addition to the newest message.
Plan
Token rate
Starter
Free (500k max per project)
Standard
$5/million tokens
Enterprise
$8/million tokens
Context retrieval tokens appear as Assistants Context Tokens Processed on invoices and prompt_tokens in API responses. In API responses, completion_tokens will always be 0 because, unlike for chat, there is no answer from a model.
Input tokens are based on two requests to a model: The first request contains a question, answer, and ground truth answer, and the second request contains the same details plus generated facts returned by the model for the first request.
Output tokens are based on two responses from a model: The first response contains generated facts, and the second response contains evaluation metrics.
Plan
Input token rate
Output token rate
Starter
Not available
Not available
Standard
$8/million tokens
$15/million tokens
Enterprise
$8/million tokens
$15/million tokens
Evaluation input tokens appear as Assistants Evaluation Tokens Processed on invoices and prompt_tokens in API responses. Evalulation output tokens appear as as Assistants Evaluation Tokens Out on invoices and completion_tokens in API responses.
Rate limits are restrictions on the frequency of requests within a specified period of time. Requests that exceed a rate limit fail and return a 429 - TOO_MANY_REQUESTS status.