Skip to main content
This feature is in public preview.

Overview

Pinecone indexes built on dedicated read nodes use provisioned read hardware to provide predictable, consistent performance at sustained, high query volumes. They’re designed for large-scale vector workloads such as semantic search, recommendation engines, and mission-critical services. Dedicated read nodes differ from on-demand indexes in how they handle read operations. While on-demand indexes use shared, multi-tenant capacity for reads, dedicated read nodes provision exclusive hardware for reads—memory, local SSDs, and compute. Both index types use Pinecone’s serverless infrastructure for writes and storage. When you create a dedicated read nodes index, Pinecone provisions resources based on your choice of node type, number of shards, and number of replicas. These resources include local SSDs and memory that cache all your index data, and provide dedicated query executors to handle read operations (query, fetch, list). This architecture eliminates cold starts and ensures consistent low-latency performance, even under heavy load. Dedicated read nodes support dense, sparse, or hybrid indexes, giving you flexibility in your search and retrieval strategy. Because storage (shards) and compute (replicas) scale independently, you can optimize for your specific workload characteristics.

Read path for dedicated read nodes

On-demand vs dedicated

On-demand indexes and dedicated read nodes are both built on Pinecone’s serverless infrastructure. They use the same write path, storage layer, and data operations API. However, every dedicated read nodes index has isolated hardware for read operations (query, fetch, list), allowing these operations to run on dedicated query executors. This affects performance, cost, and how you scale:
FeatureOn-demandDedicated read nodes
Read infrastructureMulti-tenant compute resources shared across customersIsolated, provisioned query executors dedicated to your index
Read costsPay per read unit (1 RU per 1 GB of namespace size per query, minimum 0.25 RU)Fixed hourly rate for read capacity based on node type, shards, and replicas
Other costsStorage and write costs based on usageStorage and write costs based on usage (same as on-demand)
CachingBest-effort; frequently accessed data is cached, but cold queries fetch from object storageGuaranteed; all index data always warm in memory and on local SSDs
Read rate limits2,000 RUs/second per index (adjustable)No read rate limits (only bounded by CPU capacity)
ScalingAutomatic; Pinecone handles capacityManual; add shards for storage, add replicas for throughput
Best forVariable workloads, multi-tenant applications with many namespaces, low to moderate query ratesSustained high query rates, large single-namespace workloads, predictable performance requirements

When to use dedicated read nodes

Dedicated read nodes are ideal for workloads with millions to billions of records and predictable query rates. They provide performance and cost benefits compared to on-demand for high-throughput workloads, and may be required when your workload exceeds on-demand rate limits. There’s no universal formula for choosing between on-demand and dedicated read nodes—performance and cost vary by workload (vector dimensionality, metadata filtering, and query patterns). Consider the following factors when making your decision:
With dedicated read nodes, you allocate dedicated read hardware for your index, and your data is cached in memory and on local SSDs. This provides:
  • Consistent low latency under heavy load.
  • No cold starts (fetching data from object storage).
  • Performance isolation from other workloads.
  • Linear scaling by adding replicas.
  • Predictable costs based on fixed hourly rates for provisioned hardware.
If predictable performance and cost are critical for your application, dedicated read nodes may be a better fit than on-demand.
On-demand indexes are subject to read unit rate limits (default: 2,000 RUs/second per index).A high query volume on a large index can exceed these limits. For example, a 15 GB namespace at 150 QPS requires approximately 2,250 RUs/second (15 RUs per query × 150 QPS), which exceeds the default rate limit.Dedicated read nodes have no read rate limits and provide dedicated capacity for predictable QPS without throttling (bounded only by CPU capacity), making them better suited for high-throughput workloads.
Recommendation engines for use cases such as e-commerce and media require very high throughput and low latency to maintain positive user experiences. Dedicated read nodes are purpose-built for these use cases, providing:
  • Consistent performance for thousands of queries per second
  • Low latency for real-time recommendations
  • Scalability to billion-vector datasets
  • No performance degradation during traffic spikes
Similar requirements apply to other real-time use cases like semantic search at scale, personalization engines, and mission-critical services with strict performance SLOs.
Dedicated read nodes indexes support only a single namespace. If your application requires multiple namespaces, on-demand is a better fit.
Multi-namespace support is coming soon. For early access, contact us.
On-demand indexes are better suited for workloads with unpredictable or highly variable traffic patterns. For example:
  • RAG systems with variable query volumes
  • Agentic applications with sporadic usage
  • Prototypes and development environments with intermittent activity
  • Scheduled jobs with infrequent, batch-style queries
Additionally, on-demand is better for indexes with many namespaces, even if you have high query volumes. Dedicated read nodes currently only support single-namespace indexes, so multi-tenant applications requiring namespace-based isolation should use on-demand until multi-namespace support is available.For these scenarios, on-demand’s elasticity and usage-based pricing provide better cost efficiency than provisioning dedicated capacity.
Dedicated read nodes can handle predictable traffic spikes efficiently if you scale replicas proactively via the API. For example, you can provision extra replicas before a scheduled email campaign and scale back down afterward. Auto-scaling will be available in a future release.
On-demand and dedicated read nodes have different cost structures. The key difference is read costs: on-demand uses usage-based pricing, while dedicated read nodes use a fixed hourly rate based on provisioned hardware. Write and storage costs are usage-based for both modes.Dedicated read nodes become cost-effective when you have predictable, sustained query volumes that make full use of your provisioned capacity. With unpredictable or low query volumes, you pay hourly rates even when your machines sit idle, making on-demand’s usage-based pricing more economical.For detailed cost information, comparison tables, and estimation tools, see the Cost section of this guide.
Performance depends on your specific workload — index size, vector dimensionality, metadata filtering, query patterns, throughput requirements, and latency requirements. Testing is the only way to know for sure whether dedicated read nodes are right for your scenario.For a step-by-step guide to testing, see Test your workload.
If you need guidance choosing a capacity mode (on-demand or dedicated read nodes) or sizing your index configuration, contact us.

Key concepts

Before creating a dedicated read nodes index, understand the configuration options that determine capacity and performance.

Node types

A node is the basic unit of compute and cache storage capacity for a dedicated read nodes index. Each shard runs on one node, so the node type you choose determines the performance characteristics and cost of your index. The total number of nodes in your index is calculated as shards × replicas. For example, an index with two shards and two replicas uses four nodes. There are two node types: b1 and t1. Both are suitable for large-scale and demanding workloads, but they differ in processing power and memory capacity, and they cache different data.
b1 (Balanced)t1 (Performance)
Memory cachingVector index stored in memoryVector index + vector projections cached in memory
Use casePredictable performance for sustained query rates with balanced cost efficiencyHighest performance for the most demanding workloads with extreme query volumes and strict latency requirements
Storage250 GB per shard250 GB per shard
Compute & memoryBase-level compute and memory resources~4x more compute and memory than b1
CostLower-cost option~3x the cost of b1
Consider using t1 nodes if your performance requirements are not met by b1 nodes, or if t1 nodes are more cost-effective than b1 nodes for your workload.
When choosing a node type, remember that:
  • Both types of nodes provide 250 GB of storage per shard. The difference is in compute and memory, which affects query performance.
  • Because t1 nodes cache more data in memory than b1 nodes, an index may require more shards on t1 than on b1 (for the same data).
  • You can change node types after creating your index.

Shards

Shards determine the storage capacity of an index. Each shard provides 250 GB of storage, and data is split across all the shards in an index. To respond to a query, the index gathers data from all shards as needed. To determine how many shards you need, calculate your index size and then calculate the number of shards.
It’s your responsibility to allocate enough shards to accommodate the size of your index. If your index exceeds the capacity of its shards, write operations (upsert, update, delete) are blocked, but read operations continue to work normally.

Replicas

Replicas multiply the compute resources and data of an index, allowing for higher query throughput and availability. Each replica is a complete copy of your index data and has its own dedicated compute resources.
  • Throughput scales approximately linearly with replicas. For example, if one replica handles 50 QPS at your target latency, two replicas should handle approximately 100 QPS.
  • You can scale replicas up or down with no downtime using the API. See Add or remove replicas.
  • Minimum: 0 replicas (pauses the index).
  • For high availability, use at least two replicas. The recommended approach is to allocate n+1 replicas where n is your minimum for throughput. Pinecone distributes replicas across availability zones (up to three per region), so if one zone fails, remaining replicas continue serving queries.
To determine how many replicas you need, test your workload and then calculate the number of replicas.
Actual performance varies based on workload characteristics (query complexity, vector dimensions, metadata characteristics), metadata filter selectivity, and node type (b1 vs t1). Always test with your specific workload.

Index fullness

Index fullness measures how much of your index’s allocated capacity is being used. To ensure predictable performance, dedicated read nodes cache your data in memory and on local SSD.
  • You can use Pinecone’s API to check index fullness. There are three metrics to monitor: memoryFullness, storageFullness, and indexFullness.
    indexFullness is the maximum of memoryFullness and storageFullness.
  • Usually, storage fills up first. However, memory can be the limiting factor when you have b1 nodes with many low-dimension vectors, or when you have t1 nodes with high-dimension vectors and lots of metadata.
  • Monitor fullness regularly and add shards before your index reaches capacity. When indexFullness reaches 1.0 (100%), write operations (upsert, update, delete) are blocked, but read operations continue to work normally.
Add shards when index fullness reaches 70-80%, especially if you expect continued growth. Adding shards reduces storage fullness (index data is spread across shards, so each stores less) and memory fullness (with less data per shard, there’s less to cache in memory), helping you avoid write failures.

Test your workload

To choose between on-demand and dedicated read nodes, or to optimize your dedicated read nodes configuration, test with your actual workload. Performance varies based on factors such as the size of your index,vector dimensionality, metadata characteristics, and query patterns.
1

Calculate the size of your index

Calculate the size of your index to determine how many shards it requires.
2

Create and populate a test index

Create a dedicated read nodes index with representative data for your workload. You’ll use this index for testing.
If you don’t restore your test index from a backup, you can upsert or import your data.
3

Migrate your test index to dedicated read nodes (if necessary)

If your test index is on-demand, migrate it to dedicated read nodes. To start, use a single b1 replica.
Don’t migrate your production index yet. At this point, you’re just testing your workload.
4

Run a load test

Run realistic query patterns against your test index, gradually increasing QPS. For example, start at 10 QPS for about 30 minutes, then increase by 10 QPS increments while monitoring latency. Identify the QPS where latency exceeds your target threshold.
5

Calculate replicas

Throughput scales approximately linearly with replicas. For example, if one replica handles 50 QPS at your target latency, two replicas should handle approximately 100 QPS. However, performance can vary based on metadata filter selectivity.To calculate the number of replicas required for your target QPS, use this formula, rounding up:
Minimum replicas = (Required QPS) / (QPS per replica)
For more information, see Number of replicas.
6

Adjust and re-test

To meet your performance and cost goals, adjust your configuration as needed and re-test:Continue iterating until you meet your requirements with room for growth.

Calculate the size of your index

To determine how many shards your index requires, calculate your index size and then use the formula in the section below.

Index size

A record can include a dense vector, a sparse vector, or both. Use the formula for your index type to calculate total size:
A dense index contains records with one dense vector each.
Dense index records can also contain sparse vectors (when the index metric is set to dotproduct), which can be useful for hybrid search. To learn how to calculate the size of a hybrid index, see Hybrid index.
Calculate dense index size (assuming no sparse vectors)
Index size = Number of records × (
               ID size + 
               Metadata size +
               Dense vector dimensions × 4 bytes
             )
Where:
  • ID size and Metadata size are measured in bytes, averaged across all records.
  • Each Dense vector dimension uses 4 bytes.
Example dense index calculationsThese examples assume 8-byte IDs:
RecordsDense vector dimensionsAvg metadata sizeIndex size
500,000768500 bytes1.79 GB
1,000,00015361,000 bytes7.15 GB
5,000,000102415,000 bytes95.5 GB
10,000,00015361,000 bytes71.5 GB
Example: 500,000 records × (8-byte ID + (768 dense vector dimensions × 4 bytes) + 500 bytes of metadata) = 1.79 GB

Number of shards

To calculate the number of shards your index requires, divide the size of your index by 250 GB and round up:
Minimum shards = (Index size) / (250 GB per shard)
To maintain optimal performance, provision additional shards to keep your index at 70-80% capacity. For example, a 500 GB index should have three shards (750 GB capacity = 67% full), not two shards (500 GB capacity = 100% full). Example shard calculations
Index sizeMinimum shardsRecommended shards
~71 GB1 (250 GB; 28% full)1 (250 GB; 28% full)
~300 GB2 (500 GB; 60% full)2 (500 GB; 60% full)
~400 GB2 (500 GB; 80% full)3 (750 GB; 53% full)
Other considerations
  • Every index must have at least one shard. However, you can pause an index by reducing its replicas to 0.
  • After you’ve created your index, monitor its fullness. When your index approaches capacity, you can add shards.
Add shards when index fullness reaches 70-80%, especially if you expect continued growth. Adding shards reduces storage fullness (index data is spread across shards, so each stores less) and memory fullness (with less data per shard, there’s less to cache in memory), helping you avoid write failures.

Number of replicas

To calculate the number of replicas your index requires, first test your workload to find the QPS a single replica can handle at your target latency. Then, use this formula, and round up:
Minimum replicas = (Required QPS) / (QPS per replica)
For example, if one replica handles 50 QPS at your target latency and you need 150 QPS, you need three replicas. Other considerations
  • Throughput scales approximately linearly with replicas, but performance can vary based on metadata filter selectivity.
  • For high availability, allocate n+1 replicas where n is your minimum for throughput. Pinecone distributes replicas across availability zones.

Create a dedicated read nodes index

You can create a dedicated read nodes index from scratch or from a backup of an existing index.

From scratch

To create a new dedicated read nodes index from scratch, call Create an index. In the request body, in the spec.serverless.read_capacity object, set the following fields:
FieldValueNotes
modeDedicated
dedicated.node_typeb1 or t1See node types
dedicated.scalingManualCurrently the only option
dedicated.manual.shardsNumber of shards neededMinimum 1 shard; each shard provides 250 GB of storage
dedicated.manual.replicasNumber of replicas neededMinimum 0 (this pauses the index)
To learn how to determine the number of shards and replicas your index requires, see Calculate the size of your index.
Example
PINECONE_API_KEY="YOUR_API_KEY"

curl -X POST "https://api.pinecone.io/indexes" \
     -H "Accept: application/json" \
     -H "Content-Type: application/json" \
     -H "Api-Key: $PINECONE_API_KEY" \
     -H "X-Pinecone-Api-Version: 2025-10" \
     -d '{
           "name": "example-dedicated-index",
           "dimension": 1024,
           "metric": "cosine",
           "deletion_protection": "enabled",
           "tags": {
             "environment": "production"
           },
           "vector_type": "dense",
           "spec": {
             "serverless": {
               "cloud": "aws",
               "region": "us-east-1",
               "read_capacity": {
                 "mode": "Dedicated",
                 "dedicated": {
                   "node_type": "b1",
                   "scaling": "Manual",
                   "manual": {
                     "shards": 2,
                     "replicas": 1
                   }
                 }
               }
             }
           }
         }'
Example response:
{
  "name": "example-dedicated-index",
  "vector_type": "dense",
  "metric": "cosine",
  "dimension": 1024,
  "status": {
    "ready": false,
    "state": "Initializing"
  },
  "host": "example-dedicated-index-1c6ab6aa.svc.aped-4627-b74a.pinecone.io",
  "spec": {
    "serverless": {
      "region": "us-east-1",
      "cloud": "aws",
      "read_capacity": {
        "mode": "Dedicated",
        "dedicated": {
          "node_type": "b1",
          "scaling": "Manual",
          "manual": {
            "shards": 2, // <---- desired state
            "replicas": 1
          }
        },
        "status": {
          "state": "Migrating",
          "current_shards": null, // <---- current state
          "current_replicas": null
        }
      }
    }
  },
  "deletion_protection": "enabled",
  "tags": {
    "environment": "production"
  }
}
The response includes two status fields:
FieldDescription
status.stateOverall index status (for example, Initializing, Ready, Terminating)
spec.serverless.read_capacity.status.stateRead capacity status (Migrating, Scaling, Ready, Error)
When creating a dedicated read nodes index, status.state transitions to Ready as soon as the index is ready for reads and writes.However, spec.serverless.read_capacity.status.state remains Migrating until the index scales to its full read capacity, at which point it transitions to Ready.
After creating the index, upsert or import your data.
To upsert and search with text instead of vectors, you can configure your index to use a hosted embedding model. Call Configure an index and specify the embed object in the request body.

From a backup

To create a dedicated read nodes index from a backup:
  1. Restore the backup. This creates a new on-demand index with the same data as the original.
  2. If the restored index has multiple namespaces, delete all of them except the one you want to keep. Dedicated read nodes currently only support one namespace.
  3. Migrate the index to dedicated read nodes.

Migrate to dedicated read nodes

To migrate an existing on-demand index to dedicated read nodes, follow these steps:
1

Create a backup of your index

Create a backup of your index. If you later find that on-demand is preferable, you can restore the backup to a new on-demand index or contact support to migrate back.
2

Delete extra namespaces

If your index has multiple namespaces, delete all of them except the one you want to keep. Dedicated read nodes currently only support a single namespace.
If this is a production index, be sure to make a backup before deleting namespaces. Or, if you need multiple namespaces, contact support to discuss early access to multi-namespace support for dedicated read nodes.
3

Calculate your index size

Calculate your index size to determine how many shards you need.
4

Migrate the index

To migrate the index, call Configure an index. In the request body, in the spec.serverless.read_capacity object, set the following fields:
FieldValueNotes
modeDedicated
dedicated.node_typeb1 or t1See node types
dedicated.scalingManualCurrently the only option
dedicated.manual.shardsNumber of shards neededMinimum 1 shard; each shard provides 250 GB of storage
dedicated.manual.replicasNumber of replicas neededMinimum 0 (this pauses the index)
ExampleThis example migrates an index to dedicated read nodes using b1 nodes, one shard, and one replica:
PINECONE_API_KEY="YOUR_API_KEY"
INDEX_NAME="YOUR_INDEX_NAME"

curl -X PATCH "https://api.pinecone.io/indexes/$INDEX_NAME" \
     -H "Accept: application/json" \
     -H "Content-Type: application/json" \
     -H "Api-Key: $PINECONE_API_KEY" \
     -H "X-Pinecone-Api-Version: 2025-10" \
     -d '{
           "spec": {
             "serverless": {
               "read_capacity": {
                 "mode": "Dedicated",
                 "dedicated": {
                   "node_type": "b1",
                   "scaling": "Manual",
                   "manual": {
                     "shards": 1,
                     "replicas": 1
                   }
                 }
               }
             }
           }
         }'
Example response:
{
  "name": "example-index-to-migrate",
  "vector_type": "dense",
  "metric": "cosine",
  "dimension": 1024,
  "status": {
    "ready": true,
    "state": "Ready"
  },
  "host": "example-index-to-migrate-1c6ab6aa.svc.aped-4627-b74a.pinecone.io",
  "spec": {
    "serverless": {
      "region": "us-east-1",
      "cloud": "aws",
      "read_capacity": {
        "mode": "Dedicated",
        "dedicated": {
          "node_type": "b1",
          "scaling": "Manual",
          "manual": {
            "shards": 1, // <---- desired state
            "replicas": 1
          }
        },
        "status": {
          "state": "Migrating",
          "current_shards": null, //<---- current state
          "current_replicas": null
        }
      }
    }
  },
  "deletion_protection": "disabled",
  "tags": null,
  "embed": {
    "model": "llama-text-embed-v2",
    "field_map": {
      "text": "text"
    },
    "dimension": 1024,
    "metric": "cosine",
    "write_parameters": {
      "dimension": 1024,
      "input_type": "passage",
      "truncate": "END"
    },
    "read_parameters": {
      "dimension": 1024,
      "input_type": "query",
      "truncate": "END"
    },
    "vector_type": "dense"
  }
}
The response includes two status fields:
FieldDescription
status.stateOverall index status (for example, Initializing, Ready, Terminating)
spec.serverless.read_capacity.status.stateRead capacity status (Migrating, Scaling, Ready, Error)
If status.state is set to Error, the allocated number of shards was insufficient for the size of the index. Try again, adding more shards as needed.
5

Monitor the migration

Monitor the status of the migration. When the migration is complete, spec.serverless.read_capacity.status.state is Ready.
6

Monitor index performance

After migrating, monitor your index performance to verify that it meets expectations.

Manage your index

The following sections describe how to manage a dedicated read nodes index using version 2025-10 of the Pinecone API.
To upsert and search with text instead of vectors, you can configure your index to use a hosted embedding model. To do this, call Configure an index and provide an embed object in the request body. In this object:
  • For the text field, specify the name of the field in your data that contains the text to be embedded.
  • Specify a model whose dimension requirements match the dimensions of your index.
ExampleExample request:
PINECONE_API_KEY="YOUR_API_KEY"
INDEX_NAME="YOUR_INDEX_NAME"

curl -X PATCH "https://api.pinecone.io/indexes/$INDEX_NAME" \
     -H "Content-Type: application/json" \
     -H "Api-Key: $PINECONE_API_KEY" \
     -H "X-Pinecone-Api-Version: 2025-10" \
     -d '{
           "embed": {
             "field_map": {
               "text": "chunk_text"
             },
             "model": "llama-text-embed-v2",
             "read_parameters": {
               "input_type": "query",
               "truncate": "NONE"
             },
             "write_parameters": {
               "input_type": "passage"
             }
           }
         }'
Example response:
{
  "name": "example-dedicated-index",
  "vector_type": "dense",
  "metric": "cosine",
  "dimension": 1024,
  "status": {
    "ready": true,
    "state": "Ready"
  },
  "host": "example-dedicated-index-1c6ab6aa.svc.aped-4627-b74a.pinecone.io",
  "spec": {
    "serverless": {
      "region": "us-east-1",
      "cloud": "aws",
      "read_capacity": {
        "mode": "Dedicated",
        "dedicated": {
          "node_type": "b1",
          "scaling": "Manual",
          "manual": {
            "shards": 2,
            "replicas": 1
          }
        },
        "status": {
          "state": "Ready",
          "current_shards": 2,
          "current_replicas": 1
        }
      }
    }
  },
  "deletion_protection": "enabled",
  "tags": {
    "environment": "testing"
  },
  "embed": {
    "model": "llama-text-embed-v2",
    "field_map": {
      "text": "chunk_text"
    },
    "dimension": 1024,
    "metric": "cosine",
    "write_parameters": {
      "dimension": 1024,
      "input_type": "passage",
      "truncate": "END"
    },
    "read_parameters": {
      "dimension": 1024,
      "input_type": "query",
      "truncate": "NONE"
    },
    "vector_type": "dense"
  }
}
You can also create a dedicated read nodes index when calling Create an index with integrated embedding. In the request body, use the read_capacity object to configure node type, shards, and replicas for dedicated read nodes.
To check index fullness, call Get index stats.ExampleExample request:
# To get the unique host for an index,
# see https://docs.pinecone.io/guides/manage-data/target-an-index
PINECONE_API_KEY="YOUR_API_KEY"
INDEX_HOST="YOUR_INDEX_HOST"

curl -X GET "https://$INDEX_HOST/describe_index_stats" \
     -H "Api-Key: $PINECONE_API_KEY" \
     -H "X-Pinecone-Api-Version: 2025-10"
Example response:
{
  "namespaces": {
    "__default__": {
      "vectorCount": 705000
    }
  },
  "indexFullness": 0.01,
  "totalVectorCount": 705000,
  "dimension": 1536,
  "metric": "cosine",
  "vectorType": "dense",
  "memoryFullness": 0.01,
  "storageFullness": 0.01
}
In the response, indexFullness describes how full the index is, on a scale of 0 to 1. It’s set to the greater of memoryFullness and storageFullness.
To add or remove shards, call Configure an index. This operation does not require downtime, but can take up to 30 minutes to complete. In the request body, set the following fields:
FieldValueNotes
spec.serverless.read_capacity.modeDedicated
spec.serverless.read_capacity.dedicated.scalingManual
spec.serverless.read_capacity.dedicated.manual.shardsDesired number of shardsEach shard provides 250 GB of storage
ExampleExample request:
PINECONE_API_KEY="YOUR_API_KEY"
INDEX_NAME="YOUR_INDEX_NAME"

curl -X PATCH "https://api.pinecone.io/indexes/$INDEX_NAME" \
     -H "Accept: application/json" \
     -H "Content-Type: application/json" \
     -H "Api-Key: $PINECONE_API_KEY" \
     -H "X-Pinecone-Api-Version: 2025-10" \
     -d '{
           "spec": {
             "serverless": {
               "read_capacity": {
                 "mode": "Dedicated",
                 "dedicated": {
                   "scaling": "Manual",
                   "manual": {
                     "shards": 3
                   }
                 }
               }
             }
           }
         }'
Example response:
{
  "name": "example-dedicated-index",
  "vector_type": "dense",
  "metric": "cosine",
  "dimension": 1024,
  "status": {
    "ready": true,
    "state": "Ready"
  },
  "host": "example-dedicated-index-1c6ab6aa.svc.aped-4627-b74a.pinecone.io",
  "spec": {
    "serverless": {
      "region": "us-east-1",
      "cloud": "aws",
      "read_capacity": {
        "mode": "Dedicated",
        "dedicated": {
          "node_type": "b1",
          "scaling": "Manual",
          "manual": {
            "shards": 3, // <---- desired state
            "replicas": 1
          }
        },
        "status": {
          "state": "Scaling",
          "current_shards": 2, // <---- current state
          "current_replicas": 1
        }
      }
    }
  },
  "deletion_protection": "disabled",
  "tags": null,
  "embed": {
    "model": "llama-text-embed-v2",
    "field_map": {
      "text": "text"
    },
    "dimension": 1024,
    "metric": "cosine",
    "write_parameters": {
      "dimension": 1024,
      "input_type": "passage",
      "truncate": "END"
    },
    "read_parameters": {
      "dimension": 1024,
      "input_type": "query",
      "truncate": "END"
    },
    "vector_type": "dense"
  }
}
Configuration change limits:
  • You can make one configuration change every ten minutes, but you can batch multiple changes (node type, shards, and replicas) in a single request.
  • A new configuration change can only be initiated after the previous configuration change has completed.
  • Each configuration change can take up to 30 minutes to complete.
  • Read and write operations continue normally during configuration changes.
To add or remove replicas, call Configure an index. This operation does not require downtime, but can take up to 30 minutes to complete. In the request body, set the following fields:
FieldValueNotes
spec.serverless.read_capacity.modeDedicated
spec.serverless.read_capacity.dedicated.scalingManual
spec.serverless.read_capacity.dedicated.manual.replicasDesired number of replicasAdd replicas to increase query throughput
ExampleExample request:
PINECONE_API_KEY="YOUR_API_KEY"
INDEX_NAME="YOUR_INDEX_NAME"

curl -X PATCH "https://api.pinecone.io/indexes/$INDEX_NAME" \
     -H "Accept: application/json" \
     -H "Content-Type: application/json" \
     -H "Api-Key: $PINECONE_API_KEY" \
     -H "X-Pinecone-Api-Version: 2025-10" \
     -d '{
           "spec": {
             "serverless": {
               "read_capacity": {
                 "mode": "Dedicated",
                 "dedicated": {
                   "scaling": "Manual",
                   "manual": {
                     "replicas": 2
                   }
                 }
               }
             }
           }
         }'
Example response:
{
  "name": "example-dedicated-index",
  "vector_type": "dense",
  "metric": "cosine",
  "dimension": 1024,
  "status": {
    "ready": true,
    "state": "Ready"
  },
  "host": "example-dedicated-index-1c6ab6aa.svc.aped-4627-b74a.pinecone.io",
  "spec": {
    "serverless": {
      "region": "us-east-1",
      "cloud": "aws",
      "read_capacity": {
        "mode": "Dedicated",
        "dedicated": {
          "node_type": "b1",
          "scaling": "Manual",
          "manual": {
            "shards": 1,
            "replicas": 2 // <---- desired state
          }
        },
        "status": {
          "state": "Scaling",
          "current_shards": 1,
          "current_replicas": 1 // <---- current state
        }
      }
    }
  },
  "deletion_protection": "disabled",
  "tags": null,
  "embed": {
    "model": "llama-text-embed-v2",
    "field_map": {
      "text": "text"
    },
    "dimension": 1024,
    "metric": "cosine",
    "write_parameters": {
      "dimension": 1024,
      "input_type": "passage",
      "truncate": "END"
    },
    "read_parameters": {
      "dimension": 1024,
      "input_type": "query",
      "truncate": "END"
    },
    "vector_type": "dense"
  }
}
Configuration change limits:
  • You can make one configuration change every ten minutes, but you can batch multiple changes (node type, shards, and replicas) in a single request.
  • A new configuration change can only be initiated after the previous configuration change has completed.
  • Each configuration change can take up to 30 minutes to complete.
  • Read and write operations continue normally during configuration changes.
You can change node types in either direction (b1t1 or t1b1). This operation does not require downtime, but can take up to 30 minutes to complete.
The most predictable way to increase throughput is by increasing replicas.
t1 nodes cache more data in memory than b1 nodes. Because of this, switching from b1 to t1 may require more shards.If your new configuration doesn’t have enough shards, the configuration change will fail with an error telling you how many shards are required. Update the request and retry.In the meantime, your index will continue to function normally in its original configuration.
To change node types, call Configure an index. In the request body, set the following fields:
FieldValueNotes
spec.serverless.read_capacity.modeDedicated
spec.serverless.read_capacity.dedicated.node_typeb1 or t1See node types
ExampleExample request to change from b1 to t1:
PINECONE_API_KEY="YOUR_API_KEY"
INDEX_NAME="YOUR_INDEX_NAME"

curl -X PATCH "https://api.pinecone.io/indexes/$INDEX_NAME" \
     -H "Accept: application/json" \
     -H "Content-Type: application/json" \
     -H "Api-Key: $PINECONE_API_KEY" \
     -H "X-Pinecone-Api-Version: 2025-10" \
     -d '{
           "spec": {
             "serverless": {
               "read_capacity": {
                 "mode": "Dedicated",
                 "dedicated": {
                   "node_type": "t1"
                 }
               }
             }
           }
         }'
Example response:
{
  "name": "example-dedicated-index",
  "vector_type": "dense",
  "metric": "cosine",
  "dimension": 1024,
  "status": {
    "ready": true,
    "state": "Ready"
  },
  "host": "example-dedicated-index-1c6ab6aa.svc.aped-4627-b74a.pinecone.io",
  "spec": {
    "serverless": {
      "region": "us-east-1",
      "cloud": "aws",
      "read_capacity": {
        "mode": "Dedicated",
        "dedicated": {
          "node_type": "t1",
          "scaling": "Manual",
          "manual": {
            "shards": 1,
            "replicas": 1
          }
        },
        "status": {
          "state": "Scaling",
          "current_shards": 1,
          "current_replicas": 1
        }
      }
    }
  },
  "deletion_protection": "disabled",
  "tags": null,
  "embed": {
    "model": "llama-text-embed-v2",
    "field_map": {
      "text": "text"
    },
    "dimension": 1024,
    "metric": "cosine",
    "write_parameters": {
      "dimension": 1024,
      "input_type": "passage",
      "truncate": "END"
    },
    "read_parameters": {
      "dimension": 1024,
      "input_type": "query",
      "truncate": "END"
    },
    "vector_type": "dense"
  }
}
Configuration change limits:
  • You can make one configuration change every ten minutes, but you can batch multiple changes (node type, shards, and replicas) in a single request.
  • A new configuration change can only be initiated after the previous configuration change has completed.
  • Each configuration change can take up to 30 minutes to complete.
  • Read and write operations continue normally during configuration changes.
To pause an index, set the number of replicas to 0. This operation can take up to 30 minutes to complete.
While an index is paused, you cannot write to it or read from it. For a paused index, you’re billed for storage, but not for node costs, reads, or writes.
After making a configuration change to a dedicated read nodes index (changing shards, replicas, or node type), check the status of the change by calling Describe an index.ExampleExample request:
PINECONE_API_KEY="YOUR_API_KEY"
INDEX_NAME="YOUR_INDEX_NAME"

curl -X GET "https://api.pinecone.io/indexes/$INDEX_NAME" \
     -H "Api-Key: $PINECONE_API_KEY" \
     -H "X-Pinecone-Api-Version: 2025-10" 
Example response (index scaling from one to two replicas):
{
  "name": "example-dedicated-index",
  "vector_type": "dense",
  "metric": "cosine",
  "dimension": 1536,
  "status": {
    "ready": true,
    "state": "Ready"
  },
  "host": "example-dedicated-index-1c6ab6aa.svc.aped-4627-b74a.pinecone.io",
  "spec": {
    "serverless": {
      "region": "us-east-1",
      "cloud": "aws",
      "read_capacity": {
        "mode": "Dedicated",
        "dedicated": {
          "node_type": "b1",
          "scaling": "Manual",
          "manual": {
            "shards": 1,
            "replicas": 2 // <---- desired state
          }
        },
        "status": {
          "state": "Scaling", 
          "current_shards": 1,
          "current_replicas": 1 // <---- current state
        }
      }
    }
  },
  "deletion_protection": "enabled",
  "tags": {
    "tag0": "value0"
  }
}
The response includes two status fields:
FieldDescription
status.stateOverall index status (for example, Initializing, Ready, Terminating)
spec.serverless.read_capacity.status.stateRead capacity status (Migrating, Scaling, Ready, Error)
When changing node types, shards, or replicas, monitor the read capacity status (spec.serverless.read_capacity.status.state). Possible values:
StateDescription
ReadyThe change is complete and the index is ready to serve queries at full capacity.
ScalingA change to the number of shards or replicas is in progress.
MigratingA change to the node type or read capacity is in progress.
ErrorThe operation failed. For migrations to dedicated, this typically means you didn’t allocate enough shards for your index size. Check error_message for details, and retry with more shards.
During changes to shards, replicas, and node type, the index-level status (status.state) remains Ready. This is because the index can handle reads and writes while its dedicated read capacity scales.
Configuration change limits:
  • You can make one configuration change every ten minutes, but you can batch multiple changes (node type, shards, and replicas) in a single request.
  • A new configuration change can only be initiated after the previous configuration change has completed.
  • Each configuration change can take up to 30 minutes to complete.
  • Read and write operations continue normally during configuration changes.
To change a dedicated read nodes index to on-demand, contact support. This can’t be done with the API.

Limits

The following limits apply to dedicated read nodes:
Dedicated read nodes indexes are not subject to read-operation rate limits, like on-demand indexes are. However, if your query rate exceeds the compute capacity of your index, you may observe decreased query throughput. In such cases, consider adding replicas to increase the compute resources of the index.
On dedicated read nodes indexes, write operations (upsert, update, delete) have the same rate limits as on-demand indexes.Writes that would cause your index to exceed its storage capacity are blocked. In such cases, consider adding shards to increase available storage. To determine how close to the write limit you are, check index fullness.
Currently, dedicated read nodes indexes only support a single namespace. However, multi-namespace support is coming soon. For early access, contact support.
ShardsThe minimum number of shards per index is 1.ReplicasThe minimum number of replicas per index is 0, which pauses the index.NodesThe maximum number of nodes per project is 20. This is a project limit, not an index limit.To calculate your total node count, multiply shards × replicas for each of your project’s indexes, and then sum the results. This total must not exceed 20. For example, if you have two indexes that each have two shards and three replicas, your total node count is (2 × 3) + (2 × 3) = 12 nodes.To increase your project’s node limit, contact support.
Configuration change limits:
  • You can make one configuration change every ten minutes, but you can batch multiple changes (node type, shards, and replicas) in a single request.
  • A new configuration change can only be initiated after the previous configuration change has completed.
  • Each configuration change can take up to 30 minutes to complete.
  • Read and write operations continue normally during configuration changes.
memoryFullness is an approximation and doesn’t yet account for metadata. For more information, see Index fullness.
To migrate an index from dedicated to on-demand, contact support. This cannot be done with the API.

Cost

The cost of an index has three components: read costs, write costs, and storage costs. On-demand and dedicated read nodes share infrastructure for writes and storage, so these costs are the same. However, dedicated read nodes provision dedicated hardware for read operations (query, fetch, list), which changes how read costs are calculated.
Cost componentOn-demandDedicated read nodes
Read costsUsage-based: 1 RU per 1 GB namespace size per queryFixed hourly rate: Based on node type, shards, and replicas
Write costsUsage-basedUsage-based (same as on-demand)
Storage costsUsage-basedUsage-based (same as on-demand)
If you use a hosted model for search or reranking, there are additional inference costs.
To calculate the total cost of a dedicated read nodes index, use this formula:
(Node rate × shards × replicas) + storage costs + write costs
TermDescription
Node rateMonthly rate for the node type (b1 or t1), which varies by cloud region. See Pinecone pricing.
ShardsNumber of shards allocated
ReplicasNumber of replicas allocated
Storage costsUsage-based, same as on-demand
Write costsUsage-based, same as on-demand
For help estimating costs, use the Pinecone pricing calculator or contact us.
Example: If the rate for b1 nodes on aws-us-east-1 is $168.21/month ($0.23/hour), an index with two shards and two replicas would cost:
168.21 × 2 × 2 = $672.84/month, plus storage and write costs