This feature is in early access. To apply and get started, contact Pinecone.

Provisioned read capacity is a new feature that lets you reserve dedicated storage and compute resources for an index, ensuring predictable performance and cost efficiency for queries. It is ideal for the following use cases:

  • Large-scale datasets: Use cases with billions of records and moderate query rates (10-50 QPS), such as historical data indexing or inventory management.

  • High QPS workloads: Applications needing 1,000+ queries per second over millions of records, such as real-time recommenders or search engines.

How it works

When you create an index with provisioned read capacity, Pinecone allocates dedicated resources based on your choice of tier, number of shards, and number of replicas.

Provisioned capacity affects only read performance. Write performance is the same as for on-demand indexes.

Tier

The tier determines the overall performance characteristics of an index.

  • r10 (storage-optimized) - This tier is ideal for large-scale datasets with moderate query rates (10-50 QPS). It ensures that index data is always cached in memory and on disk for warm, low-latency queries.

    In contrast, caching is best-effort for on-demand indexes; new and infrequently-accessed data may need to be fetched from object storage, resulting in cold, higher-latency queries.

  • r40 (performance-optimized) - This tier is ideal for high-QPS workloads. It ensures that an index always has the compute capacity to handle high query rates.

    In contrast, on-demand indexes share compute resources, which can lead to noisy neighbor issues, and are subject to rate limits and throttling.

Shards

Shards determine the storage capacity of an index.

Each shard provides 200 GB of storage, so it’s straightforward to calculate the number of shards you need to support your index size, plus some room for growth. For example:

Index sizeShardsCapacity
100 GB1200 GB
500 GB3600 GB
1 TB61.2 TB
1.5 TB91.8 TB

As your index size changes, you can increase or decrease the number of shards. In general, once index fullness is 80%, consider adding additional shards, especially if you expect the index to continue growing.

Index size is defined as the total size of the records across all namespaces. The size of single record is defined as the sum of the following components:

  • ID size
  • Dense vector size (equal to 4 * the dense dimensions)
  • Sparse vector size (equal to 9 * each non-zero sparse value)
  • Total metadata size (equal to the total size of all metadata fields)

Replicas

Replicas determine the query throughput of an index and provide high availability.

  • Query throughput: Each replica duplicates the compute resources available to the index, allowing increased parallel processing and higher queries per second.

    In general, throughput scales linearly with the number of replicas, but performance will vary based on the shape of the workload and the complexity of metadata filters. To determine the right number of replicas, test your query patterns or contact support@pinecone.io.

  • High availability: Replicas ensure your index remains available even if an availability zone experiences an outage. When you add a replica, Pinecone places it in a different zone within the same region, up to a maximum of three zones. If you add more than three replicas, additional replicas are placed in zones that already have a replica. This multizone approach allows your index to continue serving queries even if one zone becomes unavailable.

    To achieve high availability, provision at least n+1 replicas, where n is the minimum number of replicas required to meet your throughput needs. This ensures that, even if a zone (and its replica) fails, your index will still have enough capacity to handle your workload without interruption.

As your query throughput and availability requirements change, you can increase or decrease the number of replicas. Scaling replicas does not require downtime, but it can take up to 30 minutes.

Get started

This feature is in early access. To apply and get started, contact Pinecone.

During the early access period, contact support@pinecone.io to do the following:

  • Create an index with provisioned capacity

    • Choose the tier, number of shards, and number of replicas that best meet your needs.
    • Your index must have enough shards to accommodate its current index size, with a minimum of 1 shard.
    • Your index must have at least 1 replica.
  • Change the tier (r10 or r40) or mode (on-demand or provisioned capacity)

    • You can change the tier or mode of your index at most once every 24 hours.
  • Change the number of shards (storage capacity) or replicas (query throughput)

    • You can change the number of shards or replicas at most once per hour.
    • If reducing shards would reduce your total storage below your current index size, the change is rejected.
    • Writes that would cause your index to exceed its storage capacity are rejected.