This page describes how costs are incurred in Pinecone. For the latest pricing details, see Pricing.

Starting on July 1, 2025, new Pinecone pricing will make it easier to understand and predict future costs. For details, see Read units and Write units.

Platform fee

The Standard and Enterprise pricing plans include a monthly platform fee and usage credits:

PlanPlatform feeUsage credits
Standard$25/month$15/month
Enterprise$500/month$150/month

Usage credits do not roll over from month to month. Platform fees do not apply to organizations on the Starter plan or with annual commits.

Examples

Serverless indexes

With serverless indexes, you pay for the amount of data stored and operations performed, based on three usage metrics: read units, write units, and storage.

For the latest serverless pricing rates, see Pricing.

Read units

Read units (RUs) measure the compute, I/O, and network resources consumed by query, fetch, and list requests.

Read requests return the number of RUs used. You can use this information to monitor read costs.

Query

The number of RUs used by a query is proportional to the following factors:

  • Record count: The number of vectors contained in the target index. Only vectors stored in the relevant namespace are used.
  • Record size: Higher dimensionality or larger metadata increases the size of each scanned vector.

Because serverless indexes organize vectors in similarity-based clusters, only a fraction of each index will be read for each query. The number of RUs a query uses therefore increases much more slowly than the index size.

The following table contains the RU cost of a query at different namespace sizes and record dimensionality, assuming an average metadata size around 500 bytes:

Records per namespaceDimension=384Dimension=768Dimension=1536
100,0005 RUs5 RUs6 RUs
1,000,0006 RUs10 RUs18 RUs
10,000,00018 RUs32 RUs59 RUs

Scanning a namespace has a minimal cost of 5 RUs.

When either include_metadata or include_values are specified, an internal fetch call retrieves the full record values for the IDs returned in the initial scan. This stage consumes RUs equal to a matching fetch call - 1 RU per 10 records in the result set.

TopK valueAdditional RUs used
TopK=51
TopK=101
TopK=505

Hybrid retrieval searches over more data and incurs additional RUs. This additional cost is a factor of the size of the namespace you search and the number of non-zero sparse dimensions. Similar costs are incurred for sparse-only searches. For example, for sparse vectors with 100 non-zero sparse dimensions, approximate RUs consumed for sparse vectors are as follows:

Records per namespaceRUs for sparse vector retrieval
10,000,00017 RUs
100,000,00054 RUs

Fetch

A fetch request uses 1 RU for every 10 records fetched, for example:

# of fetched recordsRU usage
101
505
10711

Specifying a non-existent ID or adding the same ID more than once does not increase the number of RUs used. However, a fetch request will always use at least 1 RU.

List

List has a fixed cost of 1 RU for every call, with up to 100 records returned per call.

Write units

Write units (WUs) measure the storage and compute resources used by upsert, update, and delete requests.

Upsert

An upsert request uses 1 WU for each 1 KB of the request, with a minimum of 1 WU per request. When an upsert modifies an existing record, the request uses 1 WU for each 1 KB of the existing record as well.

For example, the following table shows the WUs used by upsert requests at different batch sizes and record sizes, assuming all records are new:

Records per batchRecord sizeDimensionMetadata sizeWUs
16.24 KB1536100 bytes6.24
1019.10 KB102415,000 bytes191
1003.57 KB768500 bytes357
10007.14 KB15361000 bytes7140

Update

An update requests uses 1 WU for each 1 KB of the new and existing record, with a minimum of 1 WU per request.

For example, the following table shows the WUs used by an update at different record sizes:

New record sizePrevious record sizeWUs
6.24 KB6.50 KB12.74
19.10 KB15 KB24.1
3.57 KB5 KB8.57
7.14 KB10 KB17.14

Delete

A delete requests uses 1 WU for each 1 KB of the records deleted, with a minimum of 1 WU per request.

For example, the following table shows the WUs used by delete requests at different batch sizes and record sizes:

Records per batchRecord sizeDimensionMetadata sizeWUs
16.24 KB1536100 bytes6.24
1019.10 KB102415,000 bytes191
1003.57 KB768500 bytes357
10007.14 KB15361000 bytes7140

Specifying a non-existent ID or adding the same ID more than once does not increase WU use.

Deleting all records in a namespace uses 1 WU.

Storage

Storage costs are based on the size of an index on a per-Gigabyte (GB) monthly rate. For the latest storage pricing rates, see Pricing.

The size of an index is defined as the total size of its records across all namespaces. The size of a single record is defined as the sum of three components:

  • ID size
  • Embedding size (equal to 4 times the vector’s dimensions)
  • Total metadata size (equal to the total size of all metadata fields)

The following table demonstrates a typical index size at different record counts and dimensionality:

Records per namespaceDimension=384Dimension=768Dimension=1536
100,0000.20 GB0.35 GB0.66 GB
1,000,0002.00 GB3.50 GB6.60 GB
10,000,00020.00 GB35.00 GB66.00 GB

Pod-based indexes

For each pod-based index, billing is determined by the per-minute price per pod and the number of pods the index uses, regardless of index activity. The per-minute price varies by pod type, pod size, account plan, and cloud region. For the latest pod-based index pricing rates, see Pricing.

Total cost depends on a combination of factors:

  • Pod type. Each pod type has different per-minute pricing.
  • Number of pods. This includes replicas, which duplicate pods.
  • Pod size. Larger pod sizes have proportionally higher costs per minute.
  • Total pod-minutes. This includes the total time each pod is running, starting at pod creation and rounded up to 15-minute increments.
  • Cloud provider. The cost per pod-type and pod-minute varies depending on the cloud provider you choose for your project.
  • Collection storage. Collections incur costs per GB of data per minute in storage, rounded up to 15-minute increments.
  • Plan. The free plan incurs no costs; the Standard or Enterprise plans incur different costs per pod-type, pod-minute, cloud provider, and collection storage.

The following equation calculates the total costs accrued over time:

(Number of pods) * (pod size) * (number of replicas) * (minutes pod exists) * (pod price per minute) 
+ (collection storage in GB) * (collection storage time in minutes) * (collection storage price per GB per minute)

To see a calculation of your current usage and costs, go to Settings > Usage in the Pinecone console.

Imports

Importing from object storage is the most efficient and cost-effective method to load large numbers of records into an index. The cost of an import is based on the size of the records read, whether the records were imported successfully or not.

If the import operation fails (e.g., after encountering a vector of the wrong dimension in an import with on_error="abort"), you will still be charged for the records read. However, if the import fails because of an internal system error, you will not incur charges. In this case, the import will return the error message "We were unable to process your request. If the problem persists, please contact us at https://support.pinecone.io".

For the latest import pricing rates, see Pricing.

Backups and restores

A backup is a static copy of a serverless index. Both the cost of storing a backup and restoring an index from a backup is based on the size of the index. For the latest backup and restore pricing rates, see Pricing.

Embedding

Pinecone hosts several embedding models so it’s easy to manage your vector storage and search process on a single platform. You can use a hosted model to embed your data as an integrated part of upserting and querying, or you can use a hosted model to embed your data as a standalone operation.

Embedding costs are determined by how many tokens are in a request. In general, the more words contained in your passage or query, the more tokens you generate.

For example, if you generate embeddings for the query, “What is the maximum diameter of a red pine?”, Pinecone Inference generates 10 tokens, then converts them into an embedding. If the price per token for your billing plan is $.08 per million tokens, then this API call costs $.00001.

To learn more about tokenization, see Choosing an embedding model. For the latest embed pricing rates, see Pricing.

Embedding requests returns the total tokens generated. You can use this information to monitor and manage embedding costs.

Reranking

Pinecone hosts several reranking models so it’s easy to manage two-stage vector retrieval on a single platform. You can use a hosted model to rerank results as an integrated part of a query, or you can use a hosted model to rerank results as a standalone operation.

Reranking costs are determined by the number of requests to the reranking model. For the latest rerank pricing rates, see Pricing.

Assistant

For details on how costs are incurred in Pinecone Assistant, see Assistant pricing.

See also