Understanding indexes
An index is the highest-level organizational unit of vector data in Pinecone. It accepts and stores vectors, serves queries over the vectors it contains, and does other vector operations over its contents.
Organizations on the Standard and Enterprise plans can create serverless indexes and pod-based indexes. Organizations on the free Starter plan can create only one starter pod-based index.
Once an index is created, it cannot be moved to a different project.
Serverless indexes
With serverless indexes, you don’t configure or manage any compute or storage resources. Instead, based on a breakthrough architecture, serverless indexes scale automatically based on usage, and you pay only for the amount of data stored and operations performed, with no minimums. This means that there’s no extra cost for having additional indexes.
You can create an integrated or standalone serverless index:
- An integrated index is configured to use one of Pinecone’s hosted embedding models. With this type of index, you provide text for upsert and search, and Pinecone uses the specified embedding model to convert the text to vectors automatically.
- A standalone index is not configured for any specific embedding model. With this type of index, you use an external embedding model to convert your data to vectors, and then you upsert and search with those vectors directly.
For details about how costs are calculated for a serverless index, see Understanding cost.
Cloud regions
When creating a serverless index, you must choose the cloud and region where you want the index to be hosted. The following table lists the available public clouds and regions and the plans that support them:
Cloud | Region | Supported plans | Availability phase |
---|---|---|---|
aws | us-east-1 (Virginia) | Starter, Standard, Enterprise | General availability |
aws | us-west-2 (Oregon) | Standard, Enterprise | General availability |
aws | eu-west-1 (Ireland) | Standard, Enterprise | General availability |
gcp | us-central1 (Iowa) | Standard, Enterprise | General availability |
gcp | europe-west4 (Netherlands) | Standard, Enterprise | General availability |
azure | eastus2 (Virginia) | Standard, Enterprise | General availability |
The cloud and region cannot be changed after a serverless index is created.
On the free Starter plan, you can create serverless indexes in the us-east-1
region of AWS only. To create indexes in other regions, upgrade your plan.
Pod-based indexes
With pod-based indexes, you choose one or more pre-configured units of hardware (pods). Depending on the pod type, pod size, and number of pods used, you get different amounts of storage and higher or lower latency and throughput. Be sure to choose an appropriate pod type and size for your dataset and workload.
Pod types
Different pod types are priced differently. See Understanding cost for more details.
Once a pod-based index is created, you cannot change its pod type. However, you can create a collection from an index and then create a new index with a different pod type from the collection.
s1 pods
These storage-optimized pods provide large storage capacity and lower overall costs with slightly higher query latencies than p1 pods. They are ideal for very large indexes with moderate or relaxed latency requirements.
Each s1 pod has enough capacity for around 5M vectors of 768 dimensions.
p1 pods
These performance-optimized pods provide very low query latencies, but hold fewer vectors per pod than s1 pods. They are ideal for applications with low latency requirements (<100ms).
Each p1 pod has enough capacity for around 1M vectors of 768 dimensions.
p2 pods
The p2 pod type provides greater query throughput with lower latency. For vectors with fewer than 128 dimension and queries where topK
is less than 50, p2 pods support up to 200 QPS per replica and return queries in less than 10ms. This means that query throughput and latency are better than s1 and p1.
Each p2 pod has enough capacity for around 1M vectors of 768 dimensions. However, capacity may vary with dimensionality.
The data ingestion rate for p2 pods is significantly slower than for p1 pods; this rate decreases as the number of dimensions increases. For example, a p2 pod containing vectors with 128 dimensions can upsert up to 300 updates per second; a p2 pod containing vectors with 768 dimensions or more supports upsert of 50 updates per second. Because query latency and throughput for p2 pods vary from p1 pods, test p2 pod performance with your dataset.
The p2 pod type does not support sparse vector values.
Pod size and performance
Each pod type supports four pod sizes: x1
, x2
, x4
, and x8
. Your index storage and compute capacity doubles for each size step. The default pod size is x1
. You can increase the size of a pod after index creation.
To learn about changing the pod size of an index, see Configure an index.
Pod environments
When creating a pod-based index, you must choose the cloud environment where you want the index to be hosted. The project environment can affect your pricing. The following table lists the available cloud regions and the corresponding values of the environment
parameter for the create_index
endpoint:
Cloud | Region | Environment |
---|---|---|
GCP | us-west-1 (N. California) | us-west1-gcp |
GCP | us-central-1 (Iowa) | us-central1-gcp |
GCP | us-west-4 (Las Vegas) | us-west4-gcp |
GCP | us-east-4 (Virginia) | us-east4-gcp |
GCP | northamerica-northeast-1 | northamerica-northeast1-gcp |
GCP | asia-northeast-1 (Japan) | asia-northeast1-gcp |
GCP | asia-southeast-1 (Singapore) | asia-southeast1-gcp |
GCP | us-east-1 (South Carolina) | us-east1-gcp |
GCP | eu-west-1 (Belgium) | eu-west1-gcp |
GCP | eu-west-4 (Netherlands) | eu-west4-gcp |
AWS | us-east-1 (Virginia) | us-east-1-aws |
Azure | eastus (Virginia) | eastus-azure |
Contact us if you need a dedicated deployment in other regions.
The environment cannot be changed after the index is created.
Distance metrics
When creating an index, you can choose from the following similarity metrics. For the most accurate results, choose the similarity metric used to train the embedding model for your vectors. For more information, see Vector Similarity Explained
euclidean
Querying indexes with this metric returns a similarity score equal to the squared Euclidean distance between the result and query vectors.
This metric calculates the square of the distance between two data points in a plane. It is one of the most commonly used distance metrics. For an example, see our IT threat detection example.
When you use metric='euclidean'
, the most similar results are those with the lowest similarity score.
cosine
This is often used to find similarities between different documents. The advantage is that the scores are normalized to [-1,1] range. For an example, see our generative question answering example.
dotproduct
This is used to multiply two vectors. You can use it to tell us how similar the two vectors are. The more positive the answer is, the closer the two vectors are in terms of their directions. For an example, see our semantic search example.
Was this page helpful?