Understanding cost
This page describes how costs are incurred in Pinecone for the Vector Database and the Inference API.
Pinecone Database pricing is always calculated on a per-index basis, not per project or account. So, if you run more than one index in a project, you’ll be charged for each index based on its size or usage.
Pinecone Inference pricing is based on usage.
For the latest pricing details, please see our pricing page.
Serverless indexes
With serverless indexes, you don’t configure or manage any compute or storage resources. Instead, based on a breakthrough architecture, serverless indexes scale automatically based on usage, and you pay only for the amount of data stored and operations performed, based on three usage metrics:
-
Read units (RUs): Read units measure the resources consumed by read operations such as query, fetch, and list.
-
Write units (WUs): Write units measure the resources used by write operations such as upsert, update, and delete.
-
Storage: The size of an index on a per-Gigabyte (GB) hourly rate.
For each usage metric, pricing varies by cloud provider, region, and plan (see our pricing page).
Read units
Read operations consume read units (RUs). Read units measure the compute, I/O, and network resources used during the read process.
The following operations consume RUs:
The number of RUs used by a specific request is always included in its response. For a demonstration of how to use read units to inspect read costs, see this notebook.
Fetch
A fetch request uses 1 RU for every 10 fetched records.
# of fetched records | RU usage |
---|---|
10 | 1 |
50 | 5 |
107 | 11 |
Specifying a non-existent ID or adding the same ID more than once does not increase the number of RUs used. However, a fetch request will always use at least 1 RU.
Query
The number of RUs used by a query is proportional to the following factors:
- Record count: The number of vectors contained in the target index. Only vectors stored in the relevant namespace are used.
- Record size: Higher dimensionality or larger metadata increases the size of each scanned vector.
Because serverless indexes organize vectors in similarity-based clusters, only a fraction of each index will be read for each query. The number of RUs a query uses therefore increases much more slowly than the index size.
The following table contains the RU cost of a query at different namespace sizes and record dimensionality, assuming an average metadata size around 500 bytes:
Records per namespace | Dimension=384 | Dimension=768 | Dimension=1536 |
---|---|---|---|
100,000 | 5 RUs | 5 RUs | 6 RUs |
1,000,000 | 6 RUs | 10 RUs | 18 RUs |
10,000,000 | 18 RUs | 32 RUs | 59 RUs |
Scanning a namespace has a minimal cost of 5 RUs.
When either include_metadata
or include_values
are specified, an internal fetch call retrieves the full record values for the IDs returned in the initial scan. This stage consumes RUs equal to a matching fetch call - 1 RU per 10 records in the result set.
TopK value | Additional RUs used |
---|---|
TopK=5 | 1 |
TopK=10 | 1 |
TopK=50 | 5 |
Hybrid retrieval searches over more data and incurs additional RU. This additional cost is a factor of the size of the namespace you search and the number of non-zero sparse dimensions. For example, for sparse vectors with 100 non-zero sparse dimensions, approximate RU consumed for sparse and dense vectors combined are as follows:
Records per namespace | Additional RUs for sparse vector retrieval |
---|---|
10,000,000 | 17 RUs |
100,000,000 | 54 RUs |
List
List has a fixed cost of 1 RU per call, with an additional 1 RU per paginated call.
Write units
Write operations consume write units (WUs). Write units measure the storage and compute resources used to persist a record, make it available for querying, and update the clustered index to reflect its addition.
The following operations consume WUs:
Upsert
The number of WUs used by an upsert request is proportional to the total size of records it writes and/or modifies, with a minimum of 1 WU.
The following table contains the WU cost of an upsert request at different batch sizes and record dimensionality, assuming an average metadata size around 500 bytes:
Records per batch | Dimension=384 | Dimension=768 | Dimension=1536 |
---|---|---|---|
1 | 3 WUs | 4 WUs | 7 WUs |
10 | 30 WUs | 40 WUs | 70 WUs |
100 | 300 WUs | 400 WUs | 700 WUs |
Update
The number of WUs used when updating a record is proportional to the total size of the new or previous version of the record, whichever is larger, with a minimum of 1 WU.
The following table contains the WU cost of an update request at different dimensionalities and metadata sizes, with WUs based on the new or previous metadata size, whichever is larger:
Dimension | Previous metada size | New metadata size | WUs |
---|---|---|---|
768 | 400 bytes | 500 bytes | 4 WUs |
1536 | 400 bytes | 500 bytes | 7 WUs |
1536 | 4000 bytes | 2000 bytes | 11 WUs |
Delete
The number of WUs used by a delete request is proportional to the total size of records it deletes, with a minimum of 1 WU.
The following table contains the WU cost of a delete request at different batch sizes and record dimensionality, assuming an average metadata size around 500 bytes:
Records per batch | Dimension=384 | Dimension=768 | Dimension=1536 |
---|---|---|---|
1 | 3 WUs | 4 WUs | 7 WUs |
10 | 30 WUs | 40 WUs | 70 WUs |
100 | 300 WUs | 400 WUs | 700 WUs |
Specifying a non-existent ID or adding the same ID more than once does not increase WU use.
Deleting an entire namespace using the deleteAll
flag always consumes 1 WU.
Imports and storage
The size of an index is defined as the total size of its vectors across all namespaces. The size of a single vector is defined as the sum of three components:
- ID size
- Embedding size (equal to 4 times the vector’s dimensions)
- Total metadata size (equal to the total size of all metadata fields)
The following table demonstrates a typical index size at different vector counts and dimensionality:
Records per namespace | Dimension=384 | Dimension=768 | Dimension=1536 |
---|---|---|---|
100,000 | 0.20 GB | 0.35 GB | 0.66 GB |
1,000,000 | 2.00 GB | 3.50 GB | 6.60 GB |
10,000,000 | 20.00 GB | 35.00 GB | 66.00 GB |
The cost of an import is based on the size of the records read, whether the records were imported successfully or not. If the import operation fails (e.g., after encountering a vector of the wrong dimension in an import with on_error="abort"
), you will still be charged for the records read.
However, if the import fails because of an internal system error, you will not incur charges. In this case, the import will return the error message "We were unable to process your request. If the problem persists, please contact us at https://support.pinecone.io"
.
Monitoring usage
-
Index-level monitoring: In the Pinecone console, you can track usage and performance metrics for each index.
-
Operation-level monitoring: The response to read operations like
query
,fetch
, andlist
include the number of read units consumed.
Pod-based indexes
Cost calculation
For each pod-based index, billing is determined by the per-minute price per pod and the number of pods the index uses, regardless of index activity. The per-minute price varies by pod type, pod size, account plan, and cloud region.
Total cost depends on a combination of factors:
- Pod type. Each pod type has different per-minute pricing.
- Number of pods. This includes replicas, which duplicate pods.
- Pod size. Larger pod sizes have proportionally higher costs per minute.
- Total pod-minutes. This includes the total time each pod is running, starting at pod creation and rounded up to 15-minute increments.
- Cloud provider. The cost per pod-type and pod-minute varies depending on the cloud provider you choose for your project.
- Collection storage. Collections incur costs per GB of data per minute in storage, rounded up to 15-minute increments.
- Plan. The free plan incurs no costs; the Standard or Enterprise plans incur different costs per pod-type, pod-minute, cloud provider, and collection storage.
The following equation calculates the total costs accrued over time:
To see a calculation of your current usage and costs, go to Settings > Usage in the Pinecone console.
Pricing
For pod-based index pricing rates, see our pricing page.
Example
While our pricing page lists rates on an hourly basis for ease of comparison, this example lists prices per minute, as this is how Pinecone calculates billing.
An example application has the following requirements:
- 1,000,000 vectors with 1536 dimensions
- 150 queries per second with
top_k
= 10 - Deployment in an EU region
- Ability to store 1GB of inactive vectors
Based on these requirements, the organization chooses to configure the project to use the Standard billing plan to host one p1.x2
pod with three replicas and a collection containing 1 GB of data. This project runs continuously for the month of January on the Standard plan. The components of the total cost for this example are given in Table 1 below:
Table 1: Example billing components
Billing component | Value |
---|---|
Number of pods | 1 |
Number of replicas | 3 |
Pod size | x2 |
Total pod count | 6 |
Minutes in January | 44,640 |
Pod-minutes (pods * minutes) | 267,840 |
Pod price per minute | $0.0012 |
Collection storage | 1 GB |
Collection storage minutes | 44,640 |
Price per storage minute | $0.00000056 |
The invoice for this example is given in Table 2 below:
Table 2: Example invoice
Product | Quantity | Price per unit | Charge |
---|---|---|---|
Collections | 44,640 | $0.00000056 | $0.025 |
P2 Pods (AWS) | 0 | $0.00 | |
P2 Pods (GCP) | 0 | $0.00 | |
S1 Pods | 0 | $0.00 | |
P1 Pods | 267,840 | $0.0012 | $514.29 |
Amount due $514.54
Inference API
This feature is in public preview.
Embed
Embedding costs are determined by how many tokens are in the data. In general, the more words are contained in your passage or query, the more tokens you generate. To learn more about tokenization, see Choosing an embedding model.
For example, if you generate embeddings for the query, “What is the maximum diameter of a red pine?”, Inference API generates 10 tokens, then converts them into an embedding. If the price per token for your billing plan is $.08 per million tokens, then this API call costs $.00001.
Model | Price per 1M Tokens |
---|---|
multilingual-e5-large | $0.08 |
Rerank
Reranking costs are determined by the number of requests to the model. For example, if you rerank 1,000 queries, the cost is $2.00.
Model | Price per 1k Requests |
---|---|
bge-reranker-v2-m3 | $2.00 |
Pinecone Assistant
This feature is in public preview.
The cost of using the Assistant API is determined by three factors:
- The number of input tokens processed by the model
- The number of tokens output by the model
- The total size of files stored in the assistant per month
Each assistant also incurs a minimum daily cost.
Cost controls
Pinecone offers tools to help you understand and control your costs.
-
Monitoring usage. You can use the usage dashboard in the Pinecone console to monitor your Pinecone usage and costs as these accrue.
-
Pod limits. For pod-based indexes, project owners can set limits for the total number of pods across all indexes in the project. The default pod limit is 5.
See also
Was this page helpful?