This section provides some tips for getting the best performance out of Pinecone.

Basic performance checklist

  • Switch to a cloud environment. For example: EC2, GCE, Google Colab, GCP AI Platform Notebook, or SageMaker Notebook. If you experience slow uploads or high query latencies, it might be because you are accessing Pinecone from your home network.
  • Deploy your application and your Pinecone service in the same region. Contact us if you need a dedicated deployment.
  • Reuse connections. We recommend you reuse the same pc.Index() instance when you are upserting
    vectors into, and querying, the same index.
  • Avoid quotas and limits and known limitations.

Increasing throughput

Batch upserts

When upserting larger amounts of data, it is recommended to upsert records in large batches. This should be as large as possible (up to 1000 records) without exceeding the maximum request size of 2MB. To understand the number of records you can fit into one batch, refer to the Upsert limits section.

Send upserts in parallel

Pinecone is thread-safe, so you can send multiple read and write requests in parallel to help increase throughput. You can read more about high-throughput optimizations on our blog.

For serverless indexes, reads and writes follow independent paths, so you can can send multiple read and write requests in parallel to improve throughput.

For pod-based indexes, multiple reads can be performed in parallel, and multiple writes can be performed in parallel, but multiple reads and writes cannot be performed in parallel. Therefore, write batches may affect query latency, and read batches may affect write throughput.

Upsert a dataset from a dataframe

To quickly ingest data when using the Python client, use the upsert_from_dataframe method. The method includes retry logic andbatch_size, and is performant especially with Parquet file data sets.

The following example upserts the uora_all-MiniLM-L6-bm25 dataset as a dataframe.

Python
from pinecone import Pinecone, ServerlessSpec
from pinecone_datasets import list_datasets, load_dataset

pc = Pinecone(api_key="API_KEY")

dataset = load_dataset("quora_all-MiniLM-L6-bm25")

pc.create_index(
  name="my-index",
  dimension=384,
  metric="cosine",
  spec=ServerlessSpec(
    cloud="aws",
    region="us-east-1"
  )
)

index = pc.Index("my-index")

index.upsert_from_dataframe(dataset.drop(columns=["blob"]))

Scale pod-based indexes

This guidance applies to pod-based indexes only. With serverless indexes, you don’t configure any compute or storage resources, and you don’t manually manage those resources to meet demand, save on cost, or ensure high availability. Instead, serverless indexes scale automatically based on usage.

To increase throughput (QPS) for pod-based indexes, increase the number of replicas for your index. See the configure_index API reference for more details.

Example

The following example increases the number of replicas for example-index to 4.

See the configure_index API reference for more details.

Decreasing latency

Use namespaces

When you use namespaces to partition records within a single index, you can limit queries to specific namespaces to reduces the number of records scanned. For more details, see Namespaces.

Use metadata filtering

When you attach metadata key-value pairs to records, you can filter queries to retrieve only records that match the metadata filter. For more details, see Metadata filtering.

For p2 pod-based indexes, metadata filters can increase query latency.

Avoid network calls to fetch index hosts

When you target an index, the Python and Node.js clients make a network call to fetch the host where the index is deployed. In a production situation, you can avoid this additional round trip by specifying the host of the index as follows:

For each index, there is a unique URL for performing data plane operations in the form of https://{index_host}/{operation}. The Pinecone clients construct these URLs for you. However, when using the API directly, you must explicitly specify them.

You can get index URLs in the Pinecone console or using the describe_index operation. For more details, see Get an index endpoint.

Python
from pinecone.grpc import PineconeGRPC as Pinecone

pc = Pinecone(api_key="YOUR_API_KEY")

index = pc.Index(host="INDEX_HOST")
JavaScript
import { Pinecone } from '@pinecone-database/pinecone';

const pc = new Pinecone({ apiKey: 'YOUR_API_KEY' });

// For the Node.js client, you must specify both the index name and host.
const index = pc.index("INDEX_NAME", "INDEX_HOST");

You can get the host of an index using the Pinecone console or using the describe_index operation. For more details, see Get an index endpoint.