This page describes how costs are incurred in Pinecone for the Vector Database and the Inference API.

Pinecone Database pricing is always calculated on a per-index basis, not per project or account. So, if you run more than one index in a project, you’ll be charged for each index based on its size or usage.

Pinecone Inference pricing is based on usage.

For the latest pricing details, please see our pricing page.

Serverless indexes

With serverless indexes, you don’t configure or manage any compute or storage resources. Instead, based on a breakthrough architecture, serverless indexes scale automatically based on usage, and you pay only for the amount of data stored and operations performed, based on three usage metrics:

  • Read units (RUs): Read units measure the resources consumed by read operations such as query, fetch, and list.

  • Write units (WUs): Write units measure the resources used by write operations such as upsert, update, and delete.

  • Storage: The size of an index on a per-Gigabyte (GB) hourly rate.

For each usage metric, pricing varies by cloud provider, region, and plan (see our pricing page).

Read units

Read operations consume read units (RUs). Read units measure the compute, I/O, and network resources used during the read process.

The following operations consume RUs:

The number of RUs used by a specific request is always included in its response. For a demonstration of how to use read units to inspect read costs, see this notebook.

Fetch

A fetch request uses 1 RU for every 10 fetched records.

# of fetched recordsRU usage
101
505
10711

Specifying a non-existent ID or adding the same ID more than once does not increase the number of RUs used. However, a fetch request will always use at least 1 RU.

Query

The number of RUs used by a query is proportional to the following factors:

  • Record count: The number of vectors contained in the target index. Only vectors stored in the relevant namespace are used.
  • Record size: Higher dimensionality or larger metadata increases the size of each scanned vector.

Because serverless indexes organize vectors in similarity-based clusters, only a fraction of each index will be read for each query. The number of RUs a query uses therefore increases much more slowly than the index size.

The following table contains the RU cost of a query at different namespace sizes and record dimensionality, assuming an average metadata size around 500 bytes:

Records per namespaceDimension=384Dimension=768Dimension=1536
100,0005 RUs5 RUs6 RUs
1,000,0006 RUs10 RUs18 RUs
10,000,00018 RUs32 RUs59 RUs

Scanning a namespace has a minimal cost of 5 RUs.

When either include_metadata or include_values are specified, an internal fetch call retrieves the full record values for the IDs returned in the initial scan. This stage consumes RUs equal to a matching fetch call - 1 RU per 10 records in the result set.

TopK valueAdditional RUs used
TopK=51
TopK=101
TopK=505

Hybrid retrieval searches over more data and incurs additional RU. This additional cost is a factor of the size of the namespace you search and the number of non-zero sparse dimensions. For example, for sparse vectors with 100 non-zero sparse dimensions, approximate RU consumed for sparse and dense vectors combined are as follows:

Records per namespaceAdditional RUs for sparse vector retrieval
10,000,00017 RUs
100,000,00054 RUs

List

List has a fixed cost of 1 RU per call, with an additional 1 RU per paginated call.

Write units

Write operations consume write units (WUs). Write units measure the storage and compute resources used to persist a record, make it available for querying, and update the clustered index to reflect its addition.

The following operations consume WUs:

Upsert

The number of WUs used by an upsert request is proportional to the total size of records it writes and/or modifies, with a minimum of 1 WU.

The following table contains the WU cost of an upsert request at different batch sizes and record dimensionality, assuming an average metadata size around 500 bytes:

Records per batchDimension=384Dimension=768Dimension=1536
13 WUs4 WUs7 WUs
1030 WUs40 WUs70 WUs
100300 WUs400 WUs700 WUs

Update

The number of WUs used when updating a record is proportional to the total size of the new or previous version of the record, whichever is larger, with a minimum of 1 WU.

The following table contains the WU cost of an update request at different dimensionalities and metadata sizes, with WUs based on the new or previous metadata size, whichever is larger:

DimensionPrevious metada sizeNew metadata sizeWUs
768400 bytes500 bytes4 WUs
1536400 bytes500 bytes7 WUs
15364000 bytes2000 bytes11 WUs

Delete

The number of WUs used by a delete request is proportional to the total size of records it deletes, with a minimum of 1 WU.

The following table contains the WU cost of a delete request at different batch sizes and record dimensionality, assuming an average metadata size around 500 bytes:

Records per batchDimension=384Dimension=768Dimension=1536
13 WUs4 WUs7 WUs
1030 WUs40 WUs70 WUs
100300 WUs400 WUs700 WUs

Specifying a non-existent ID or adding the same ID more than once does not increase WU use.

Deleting an entire namespace using the deleteAll flag always consumes 1 WU.

Imports and storage

The size of an index is defined as the total size of its vectors across all namespaces. The size of a single vector is defined as the sum of three components:

  • ID size
  • Embedding size (equal to 4 times the vector’s dimensions)
  • Total metadata size (equal to the total size of all metadata fields)

The following table demonstrates a typical index size at different vector counts and dimensionality:

Records per namespaceDimension=384Dimension=768Dimension=1536
100,0000.20 GB0.35 GB0.66 GB
1,000,0002.00 GB3.50 GB6.60 GB
10,000,00020.00 GB35.00 GB66.00 GB

The cost of an import is based on the size of the records read, whether the records were imported successfully or not. If the import operation fails (e.g., after encountering a vector of the wrong dimension in an import with on_error="abort"), you will still be charged for the records read.

However, if the import fails because of an internal system error, you will not incur charges. In this case, the import will return the error message "We were unable to process your request. If the problem persists, please contact us at https://support.pinecone.io".

Monitoring usage

  • Index-level monitoring: In the Pinecone console, you can track usage and performance metrics for each index.

  • Operation-level monitoring: The response to read operations like query, fetch, and list include the number of read units consumed.

Pod-based indexes

Cost calculation

For each pod-based index, billing is determined by the per-minute price per pod and the number of pods the index uses, regardless of index activity. The per-minute price varies by pod type, pod size, account plan, and cloud region.

Total cost depends on a combination of factors:

  • Pod type. Each pod type has different per-minute pricing.
  • Number of pods. This includes replicas, which duplicate pods.
  • Pod size. Larger pod sizes have proportionally higher costs per minute.
  • Total pod-minutes. This includes the total time each pod is running, starting at pod creation and rounded up to 15-minute increments.
  • Cloud provider. The cost per pod-type and pod-minute varies depending on the cloud provider you choose for your project.
  • Collection storage. Collections incur costs per GB of data per minute in storage, rounded up to 15-minute increments.
  • Plan. The free plan incurs no costs; the Standard or Enterprise plans incur different costs per pod-type, pod-minute, cloud provider, and collection storage.

The following equation calculates the total costs accrued over time:

(Number of pods) * (pod size) * (number of replicas) * (minutes pod exists) * (pod price per minute) 
+ (collection storage in GB) * (collection storage time in minutes) * (collection storage price per GB per minute)

To see a calculation of your current usage and costs, go to Settings > Usage in the Pinecone console.

Pricing

For pod-based index pricing rates, see our pricing page.

Example

While our pricing page lists rates on an hourly basis for ease of comparison, this example lists prices per minute, as this is how Pinecone calculates billing.

An example application has the following requirements:

  • 1,000,000 vectors with 1536 dimensions
  • 150 queries per second with top_k = 10
  • Deployment in an EU region
  • Ability to store 1GB of inactive vectors

Based on these requirements, the organization chooses to configure the project to use the Standard billing plan to host one p1.x2 pod with three replicas and a collection containing 1 GB of data. This project runs continuously for the month of January on the Standard plan. The components of the total cost for this example are given in Table 1 below:

Table 1: Example billing components

Billing componentValue
Number of pods1
Number of replicas3
Pod sizex2
Total pod count6
Minutes in January44,640
Pod-minutes (pods * minutes)267,840
Pod price per minute$0.0012
Collection storage1 GB
Collection storage minutes44,640
Price per storage minute$0.00000056

The invoice for this example is given in Table 2 below:

Table 2: Example invoice

ProductQuantityPrice per unitCharge
Collections44,640$0.00000056$0.025
P2 Pods (AWS)0$0.00
P2 Pods (GCP)0$0.00
S1 Pods0$0.00
P1 Pods267,840$0.0012$514.29

Amount due $514.54

Inference API

This feature is in public preview.

Embed

Embedding costs are determined by how many tokens are in the data. In general, the more words are contained in your passage or query, the more tokens you generate. To learn more about tokenization, see Choosing an embedding model.

For example, if you generate embeddings for the query, “What is the maximum diameter of a red pine?”, Inference API generates 10 tokens, then converts them into an embedding. If the price per token for your billing plan is $.08 per million tokens, then this API call costs $.00001.

ModelPrice per 1M Tokens
multilingual-e5-large$0.08

Rerank

Reranking costs are determined by the number of requests to the model. For example, if you rerank 1,000 queries, the cost is $2.00.

ModelPrice per 1k Requests
bge-reranker-v2-m3$2.00

Pinecone Assistant

This feature is in public preview.

The cost of using the Assistant API is determined by three factors:

  • The number of input tokens processed by the model
  • The number of tokens output by the model
  • The total size of files stored in the assistant per month

Each assistant also incurs a minimum daily cost.

Cost controls

Pinecone offers tools to help you understand and control your costs.

  • Monitoring usage. You can use the usage dashboard in the Pinecone console to monitor your Pinecone usage and costs as these accrue.

  • Pod limits. For pod-based indexes, project owners can set limits for the total number of pods across all indexes in the project. The default pod limit is 5.

See also