Overview
Pinecone indexes built on dedicated read nodes use provisioned read hardware to provide predictable, consistent performance at sustained, high query volumes. They’re designed for large-scale vector workloads such as semantic search, recommendation engines, and mission-critical services. Dedicated read nodes differ from on-demand indexes in how they handle read operations. While on-demand indexes use shared, multi-tenant capacity for reads, dedicated read nodes provision exclusive hardware for reads—memory, local SSDs, and compute. Both index types use Pinecone’s serverless infrastructure for writes and storage. When you create a dedicated read nodes index, Pinecone provisions resources based on your choice of node type, number of shards, and number of replicas. These resources include local SSDs and memory that cache all your index data, and provide dedicated query executors to handle read operations (query, fetch, list). This architecture eliminates cold starts and ensures consistent low-latency performance, even under heavy load. Dedicated read nodes support dense, sparse, or hybrid indexes, giving you flexibility in your search and retrieval strategy. Because storage (shards) and compute (replicas) scale independently, you can optimize for your specific workload characteristics.

Read path for dedicated read nodes
On-demand vs dedicated
On-demand indexes and dedicated read nodes are both built on Pinecone’s serverless infrastructure. They use the same write path, storage layer, and data operations API. However, every dedicated read nodes index has isolated hardware for read operations (query, fetch, list), allowing these operations to run on dedicated query executors. This affects performance, cost, and how you scale:| Feature | On-demand | Dedicated read nodes |
|---|---|---|
| Read infrastructure | Multi-tenant compute resources shared across customers | Isolated, provisioned query executors dedicated to your index |
| Read costs | Pay per read unit (1 RU per 1 GB of namespace size per query, minimum 0.25 RU) | Fixed hourly rate for read capacity based on node type, shards, and replicas |
| Other costs | Storage and write costs based on usage | Storage and write costs based on usage (same as on-demand) |
| Caching | Best-effort; frequently accessed data is cached, but cold queries fetch from object storage | Guaranteed; all index data always warm in memory and on local SSDs |
| Read rate limits | 2,000 RUs/second per index (adjustable) | No read rate limits (only bounded by CPU capacity) |
| Scaling | Automatic; Pinecone handles capacity | Manual; add shards for storage, add replicas for throughput |
| Best for | Variable workloads, multi-tenant applications with many namespaces, low to moderate query rates | Sustained high query rates, large single-namespace workloads, predictable performance requirements |
When to use dedicated read nodes
Dedicated read nodes are ideal for workloads with millions to billions of records and predictable query rates. They provide performance and cost benefits compared to on-demand for high-throughput workloads, and may be required when your workload exceeds on-demand rate limits. There’s no universal formula for choosing between on-demand and dedicated read nodes—performance and cost vary by workload (vector dimensionality, metadata filtering, and query patterns). Consider the following factors when making your decision:Predictable, consistent performance and cost
Predictable, consistent performance and cost
- Consistent low latency under heavy load.
- No cold starts (fetching data from object storage).
- Performance isolation from other workloads.
- Linear scaling by adding replicas.
- Predictable costs based on fixed hourly rates for provisioned hardware.
High throughput without rate limits or throttling
High throughput without rate limits or throttling
15 RUs per query × 150 QPS), which exceeds the default rate limit.Dedicated read nodes have no read rate limits and provide dedicated capacity for predictable QPS without throttling (bounded only by CPU capacity), making them better suited for high-throughput workloads.Recommendation engines and real-time use cases
Recommendation engines and real-time use cases
- Consistent performance for thousands of queries per second
- Low latency for real-time recommendations
- Scalability to billion-vector datasets
- No performance degradation during traffic spikes
Single namespace workload (multi-namespace coming soon)
Single namespace workload (multi-namespace coming soon)
When NOT to use dedicated read nodes
When NOT to use dedicated read nodes
- RAG systems with variable query volumes
- Agentic applications with sporadic usage
- Prototypes and development environments with intermittent activity
- Scheduled jobs with infrequent, batch-style queries
Cost considerations
Cost considerations
Test results for your workload
Test results for your workload
Key concepts
Before creating a dedicated read nodes index, understand the configuration options that determine capacity and performance.Node types
A node is the basic unit of compute and cache storage capacity for a dedicated read nodes index. Each shard runs on one node, so the node type you choose determines the performance characteristics and cost of your index. The total number of nodes in your index is calculated asshards × replicas. For example, an index with two shards and two replicas uses four nodes.
There are two node types: b1 and t1. Both are suitable for large-scale and demanding workloads, but they differ in processing power and memory capacity, and they cache different data.
| b1 (Balanced) | t1 (Performance) | |
|---|---|---|
| Memory caching | Vector index stored in memory | Vector index + vector projections cached in memory |
| Use case | Predictable performance for sustained query rates with balanced cost efficiency | Highest performance for the most demanding workloads with extreme query volumes and strict latency requirements |
| Storage | 250 GB per shard | 250 GB per shard |
| Compute & memory | Base-level compute and memory resources | ~4x more compute and memory than b1 |
| Cost | Lower-cost option | ~3x the cost of b1 |
t1 nodes if your performance requirements are not met by b1 nodes, or if t1 nodes are more cost-effective than b1 nodes for your workload.
- Both types of nodes provide 250 GB of storage per shard. The difference is in compute and memory, which affects query performance.
- Because
t1nodes cache more data in memory thanb1nodes, an index may require more shards ont1than onb1(for the same data). - You can change node types after creating your index.
Shards
Shards determine the storage capacity of an index. Each shard provides 250 GB of storage, and data is split across all the shards in an index. To respond to a query, the index gathers data from all shards as needed. To determine how many shards you need, calculate your index size and then calculate the number of shards.Replicas
Replicas multiply the compute resources and data of an index, allowing for higher query throughput and availability. Each replica is a complete copy of your index data and has its own dedicated compute resources.- Throughput scales approximately linearly with replicas. For example, if one replica handles 50 QPS at your target latency, two replicas should handle approximately 100 QPS.
- You can scale replicas up or down with no downtime using the API. See Add or remove replicas.
- Minimum: 0 replicas (pauses the index).
- For high availability, use at least two replicas. The recommended approach is to allocate
n+1replicas wherenis your minimum for throughput. Pinecone distributes replicas across availability zones (up to three per region), so if one zone fails, remaining replicas continue serving queries.
b1 vs t1). Always test with your specific workload.Index fullness
Index fullness measures how much of your index’s allocated capacity is being used. To ensure predictable performance, dedicated read nodes cache your data in memory and on local SSD.- You can use Pinecone’s API to check index fullness. There are three metrics to monitor:
memoryFullness,storageFullness, andindexFullness.indexFullnessis the maximum ofmemoryFullnessandstorageFullness. - Usually, storage fills up first. However, memory can be the limiting factor when you have
b1nodes with many low-dimension vectors, or when you havet1nodes with high-dimension vectors and lots of metadata. - Monitor fullness regularly and add shards before your index reaches capacity. When
indexFullnessreaches 1.0 (100%), write operations (upsert, update, delete) are blocked, but read operations continue to work normally.
Test your workload
To choose between on-demand and dedicated read nodes, or to optimize your dedicated read nodes configuration, test with your actual workload. Performance varies based on factors such as the size of your index,vector dimensionality, metadata characteristics, and query patterns.Calculate the size of your index
Create and populate a test index
Migrate your test index to dedicated read nodes (if necessary)
b1 replica.Run a load test
Calculate replicas
Adjust and re-test
- Add or remove shards for storage capacity
- Add or remove replicas for throughput
- Change node types for different performance characteristics
Calculate the size of your index
To determine how many shards your index requires, calculate your index size and then use the formula in the section below.Index size
A record can include a dense vector, a sparse vector, or both. Use the formula for your index type to calculate total size:- Dense index
- Sparse index
- Hybrid index
dotproduct), which can be useful for hybrid search. To learn how to calculate the size of a hybrid index, see Hybrid index.ID sizeandMetadata sizeare measured in bytes, averaged across all records.- Each
Dense vector dimensionuses 4 bytes.
| Records | Dense vector dimensions | Avg metadata size | Index size |
|---|---|---|---|
| 500,000 | 768 | 500 bytes | 1.79 GB |
| 1,000,000 | 1536 | 1,000 bytes | 7.15 GB |
| 5,000,000 | 1024 | 15,000 bytes | 95.5 GB |
| 10,000,000 | 1536 | 1,000 bytes | 71.5 GB |
Number of shards
To calculate the number of shards your index requires, divide the size of your index by 250 GB and round up:| Index size | Minimum shards | Recommended shards |
|---|---|---|
| ~71 GB | 1 (250 GB; 28% full) | 1 (250 GB; 28% full) |
| ~300 GB | 2 (500 GB; 60% full) | 2 (500 GB; 60% full) |
| ~400 GB | 2 (500 GB; 80% full) | 3 (750 GB; 53% full) |
- Every index must have at least one shard. However, you can pause an index by reducing its replicas to 0.
- After you’ve created your index, monitor its fullness. When your index approaches capacity, you can add shards.
Number of replicas
To calculate the number of replicas your index requires, first test your workload to find the QPS a single replica can handle at your target latency. Then, use this formula, and round up:- Throughput scales approximately linearly with replicas, but performance can vary based on metadata filter selectivity.
- For high availability, allocate
n+1replicas wherenis your minimum for throughput. Pinecone distributes replicas across availability zones.
Create a dedicated read nodes index
You can create a dedicated read nodes index from scratch or from a backup of an existing index.From scratch
To create a new dedicated read nodes index from scratch, call Create an index. In the request body, in thespec.serverless.read_capacity object, set the following fields:
| Field | Value | Notes |
|---|---|---|
mode | Dedicated | |
dedicated.node_type | b1 or t1 | See node types |
dedicated.scaling | Manual | Currently the only option |
dedicated.manual.shards | Number of shards needed | Minimum 1 shard; each shard provides 250 GB of storage |
dedicated.manual.replicas | Number of replicas needed | Minimum 0 (this pauses the index) |
| Field | Description |
|---|---|
status.state | Overall index status (for example, Initializing, Ready, Terminating) |
spec.serverless.read_capacity.status.state | Read capacity status (Migrating, Scaling, Ready, Error) |
status.state transitions to Ready as soon as the index is ready for reads and writes.However, spec.serverless.read_capacity.status.state remains Migrating until the index scales to its full read capacity, at which point it transitions to Ready.From a backup
To create a dedicated read nodes index from a backup:- Restore the backup. This creates a new on-demand index with the same data as the original.
- If the restored index has multiple namespaces, delete all of them except the one you want to keep. Dedicated read nodes currently only support one namespace.
- Migrate the index to dedicated read nodes.
Migrate to dedicated read nodes
To migrate an existing on-demand index to dedicated read nodes, follow these steps:Create a backup of your index
Delete extra namespaces
Calculate your index size
Migrate the index
spec.serverless.read_capacity object, set the following fields:| Field | Value | Notes |
|---|---|---|
mode | Dedicated | |
dedicated.node_type | b1 or t1 | See node types |
dedicated.scaling | Manual | Currently the only option |
dedicated.manual.shards | Number of shards needed | Minimum 1 shard; each shard provides 250 GB of storage |
dedicated.manual.replicas | Number of replicas needed | Minimum 0 (this pauses the index) |
b1 nodes, one shard, and one replica:| Field | Description |
|---|---|
status.state | Overall index status (for example, Initializing, Ready, Terminating) |
spec.serverless.read_capacity.status.state | Read capacity status (Migrating, Scaling, Ready, Error) |
Monitor the migration
spec.serverless.read_capacity.status.state is Ready.Monitor index performance
Manage your index
The following sections describe how to manage a dedicated read nodes index using version2025-10 of the Pinecone API.
Add a hosted embedding model
Add a hosted embedding model
embed object in the request body. In this object:- For the
textfield, specify the name of the field in your data that contains the text to be embedded. - Specify a model whose dimension requirements match the dimensions of your index.
read_capacity object to configure node type, shards, and replicas for dedicated read nodes.Monitor index fullness
Monitor index fullness
indexFullness describes how full the index is, on a scale of 0 to 1. It’s set to the greater of memoryFullness and storageFullness.Add or remove shards
Add or remove shards
| Field | Value | Notes |
|---|---|---|
spec.serverless.read_capacity.mode | Dedicated | |
spec.serverless.read_capacity.dedicated.scaling | Manual | |
spec.serverless.read_capacity.dedicated.manual.shards | Desired number of shards | Each shard provides 250 GB of storage |
- You can make one configuration change every ten minutes, but you can batch multiple changes (node type, shards, and replicas) in a single request.
- A new configuration change can only be initiated after the previous configuration change has completed.
- Each configuration change can take up to 30 minutes to complete.
- Read and write operations continue normally during configuration changes.
Add or remove replicas
Add or remove replicas
| Field | Value | Notes |
|---|---|---|
spec.serverless.read_capacity.mode | Dedicated | |
spec.serverless.read_capacity.dedicated.scaling | Manual | |
spec.serverless.read_capacity.dedicated.manual.replicas | Desired number of replicas | Add replicas to increase query throughput |
- You can make one configuration change every ten minutes, but you can batch multiple changes (node type, shards, and replicas) in a single request.
- A new configuration change can only be initiated after the previous configuration change has completed.
- Each configuration change can take up to 30 minutes to complete.
- Read and write operations continue normally during configuration changes.
Change node types
Change node types
b1 → t1 or t1 → b1). This operation does not require downtime, but can take up to 30 minutes to complete.| Field | Value | Notes |
|---|---|---|
spec.serverless.read_capacity.mode | Dedicated | |
spec.serverless.read_capacity.dedicated.node_type | b1 or t1 | See node types |
b1 to t1:- You can make one configuration change every ten minutes, but you can batch multiple changes (node type, shards, and replicas) in a single request.
- A new configuration change can only be initiated after the previous configuration change has completed.
- Each configuration change can take up to 30 minutes to complete.
- Read and write operations continue normally during configuration changes.
Pause an index
Pause an index
Check the status of a configuration change
Check the status of a configuration change
| Field | Description |
|---|---|
status.state | Overall index status (for example, Initializing, Ready, Terminating) |
spec.serverless.read_capacity.status.state | Read capacity status (Migrating, Scaling, Ready, Error) |
spec.serverless.read_capacity.status.state). Possible values:| State | Description |
|---|---|
Ready | The change is complete and the index is ready to serve queries at full capacity. |
Scaling | A change to the number of shards or replicas is in progress. |
Migrating | A change to the node type or read capacity is in progress. |
Error | The operation failed. For migrations to dedicated, this typically means you didn’t allocate enough shards for your index size. Check error_message for details, and retry with more shards. |
status.state) remains Ready. This is because the index can handle reads and writes while its dedicated read capacity scales.- You can make one configuration change every ten minutes, but you can batch multiple changes (node type, shards, and replicas) in a single request.
- A new configuration change can only be initiated after the previous configuration change has completed.
- Each configuration change can take up to 30 minutes to complete.
- Read and write operations continue normally during configuration changes.
Migrate from dedicated to on-demand
Migrate from dedicated to on-demand
Limits
The following limits apply to dedicated read nodes:Read limits
Read limits
Write limits
Write limits
Namespace limits
Namespace limits
Shard, replica, and node limits
Shard, replica, and node limits
memoryFullness
memoryFullness
memoryFullness is an approximation and doesn’t yet account for metadata. For more information, see Index fullness.Migrating from dedicated to on-demand
Migrating from dedicated to on-demand
Cost
The cost of an index has three components: read costs, write costs, and storage costs. On-demand and dedicated read nodes share infrastructure for writes and storage, so these costs are the same. However, dedicated read nodes provision dedicated hardware for read operations (query, fetch, list), which changes how read costs are calculated.| Cost component | On-demand | Dedicated read nodes |
|---|---|---|
| Read costs | Usage-based: 1 RU per 1 GB namespace size per query | Fixed hourly rate: Based on node type, shards, and replicas |
| Write costs | Usage-based | Usage-based (same as on-demand) |
| Storage costs | Usage-based | Usage-based (same as on-demand) |
Calculate dedicated read nodes costs
Calculate dedicated read nodes costs
| Term | Description |
|---|---|
| Node rate | Monthly rate for the node type (b1 or t1), which varies by cloud region. See Pinecone pricing. |
| Shards | Number of shards allocated |
| Replicas | Number of replicas allocated |
| Storage costs | Usage-based, same as on-demand |
| Write costs | Usage-based, same as on-demand |
b1 nodes on aws-us-east-1 is $168.21/month ($0.23/hour), an index with two shards and two replicas would cost: