Customers who sign up for a Standard or Enterprise plan on or after August 18, 2025 cannot create pod-based indexes. Instead, create serverless indexes, and consider using dedicated read nodes for large workloads (millions of records or more, and moderate or high query rates).
Users on Standard and Enterprise plans can contact Support for further help with sizing and testing.
Overview
There are five main considerations when deciding how to configure your Pinecone index:- Number of vectors
- Dimensionality of your vectors
- Size of metadata on each vector
- Queries per second (QPS) throughput
- Cardinality of indexed metadata
Number of vectors
The most important consideration in sizing is the number of vectors you plan on working with. As a rule of thumb, a single p1 pod can store approximately 1M vectors, while a s1 pod can store 5M vectors. However, this can be affected by other factors, such as dimensionality and metadata, which are explained below.Dimensionality of vectors
The rules of thumb above for how many vectors can be stored in a given pod assumes a typical configuration of 768 dimensions per vector. As your individual use case will dictate the dimensionality of your vectors, the amount of space required to store them may necessarily be larger or smaller. Each dimension on a single vector consumes 4 bytes of memory and storage per dimension, so if you expect to have 1M vectors with 768 dimensions each, that’s about 3GB of storage without factoring in metadata or other overhead. Using that reference, we can estimate the typical pod size and number needed for a given index. Table 1 below gives some examples of this. Table 1: Estimated number of pods per 1M vectors by dimensionalityPod type | Dimensions | Estimated max vectors per pod |
---|---|---|
p1 | 512 | 1,250,000 |
768 | 1,000,000 | |
1024 | 675,000 | |
1536 | 500,000 | |
p2 | 512 | 1,250,000 |
768 | 1,100,000 | |
1024 | 1,000,000 | |
1536 | 550,000 | |
s1 | 512 | 8,000,000 |
768 | 5,000,000 | |
1024 | 4,000,000 | |
1536 | 2,500,000 |
Queries per second (QPS)
QPS speeds are governed by a combination of the pod type of the index, the number of replicas, and thetop_k
value of queries. The pod type is the primary factor driving QPS, as the different pod types are optimized for different approaches.
The p1 pods are performance-optimized pods which provide very low query latencies, but hold fewer vectors per pod than s1 pods. They are ideal for applications with low latency requirements (<100ms). The s1 pods are optimized for storage and provide large storage capacity and lower overall costs with slightly higher query latencies than p1 pods. They are ideal for very large indexes with moderate or relaxed latency requirements.
The p2 pod type provides greater query throughput with lower latency. They support 200 QPS per replica and return queries in less than 10ms. This means that query throughput and latency are better than s1 and p1, especially for low dimension vectors (<512D).
As a rule, a single p1 pod with 1M vectors of 768 dimensions each and no replicas can handle about 20 QPS. It’s possible to get greater or lesser speeds, depending on the size of your metadata, number of vectors, the dimensionality of your vectors, and the top_K
value for your search. See Table 2 below for more examples.
Table 2: QPS by pod type and top_k
value*
Pod type | top_k 10 | top_k 250 | top_k 1000 |
---|---|---|---|
p1 | 30 | 25 | 20 |
p2 | 150 | 50 | 20 |
s1 | 10 | 10 | 10 |