This feature is in early access. To apply and get started, contact Pinecone.
Provisioned read capacity is a new feature that lets you reserve dedicated storage and compute resources for an index, ensuring predictable performance and cost efficiency for queries. It is ideal for workloads with millions to billions of records and moderate to high query rates (1000+ queries per second).

How it works

When you create an index with provisioned read capacity, Pinecone allocates dedicated storage and compute resources based on your choice of tier, number of shards, and number of replicas.
  • Dedicated storage ensures that index data is always cached in memory and on disk for warm, low-latency queries. In contrast, caching is best-effort for on-demand indexes; new and infrequently-accessed data may need to be fetched from object storage, resulting in cold, higher-latency queries.
  • Dedicated compute ensures that an index always has the capacity to handle high query rates. In contrast, on-demand indexes share compute resources and are subject to rate limits and throttling.
Provisioned capacity affects only read performance. Write performance is the same as for on-demand indexes.

Tier

The tier determines the overall performance characteristics of an index. There are two tiers: r10 and r40. Both tiers are suitable for large-scale and demanding workloads, but r40 provides increased processing power and memory.

Shards

Shards determine the storage capacity of an index. Each shard provides 200 GB of storage, so it’s straightforward to calculate the number of shards you need to support your index size, plus some room for growth. For example:
Index sizeShardsCapacity
100 GB1200 GB
500 GB3600 GB
1 TB61.2 TB
1.5 TB91.8 TB
As your index size changes, you can increase or decrease the number of shards. In general, once index fullness is 80%, consider adding additional shards, especially if you expect the index to continue growing.
Index size is defined as the total size of the records across all namespaces. The size of single record is defined as the sum of the following components:
  • ID size
  • Dense vector size (equal to 4 * the dense dimensions)
  • Sparse vector size (equal to 9 * each non-zero sparse value)
  • Total metadata size (equal to the total size of all metadata fields)

Replicas

Replicas determine the query throughput of an index and provide high availability.
  • Query throughput: Each replica duplicates the compute resources available to the index, allowing increased parallel processing and higher queries per second. In general, throughput scales linearly with the number of replicas, but performance will vary based on the shape of the workload and the complexity of metadata filters. To determine the right number of replicas, test your query patterns or contact support@pinecone.io.
  • High availability: Replicas ensure your index remains available even if an availability zone experiences an outage. When you add a replica, Pinecone places it in a different zone within the same region, up to a maximum of three zones. If you add more than three replicas, additional replicas are placed in zones that already have a replica. This multizone approach allows your index to continue serving queries even if one zone becomes unavailable. To achieve high availability, provision at least n+1 replicas, where n is the minimum number of replicas required to meet your throughput needs. This ensures that, even if a zone (and its replica) fails, your index will still have enough capacity to handle your workload without interruption.
As your query throughput and availability requirements change, you can increase or decrease the number of replicas. Scaling replicas does not require downtime, but it can take up to 30 minutes.

Get started

This feature is in early access. To apply and get started, contact Pinecone.
During the early access period, contact support@pinecone.io to do the following:
  • Create an index with provisioned read capacity
  • Change the tier (r10 or r40)
  • Change the number of shards (storage capacity)
    • If reducing shards would reduce your total storage below your current index size, the change is rejected.
    • Writes that would cause your index to exceed its storage capacity are rejected.
  • Change the number of replicas (query throughput)
    • Reducing replicas to 0 suspends the index and blocks all writes and reads.

Limits

MetricLimit
Min shards per index1
Max tier or changes1 per 24 hours
Max shard or replica changes1 per hour
Read operations (query, list, fetch) have no rate limits on provisioned indexes. Write operations (upsert, update, delete) have the same rate limits as on-demand indexes.

Costs

For each index with provisioned read capacity, billing is based on the following:
  • Provisioned read capacity costs
  • Storage costs
  • Write costs
Storage costs and write costs work the same as for on-demand indexes. Provisioned read capacity costs are calculated as follows:
(Index tier monthly rate) * (number of shards) * (number of replicas)
Index tier rates vary based on pricing plan and cloud region. For exact rates, contact Pinecone.
Examples of provisioned read capacity costs