This section provides some tips for getting the best performance out of Pinecone.

Basic performance checklist

  • Switch to a cloud environment. For example: EC2, GCE, Google Colab, GCP AI Platform Notebook, or SageMaker Notebook. If you experience slow uploads or high query latencies, it might be because you are accessing Pinecone from your home network.
  • Deploy your application and your Pinecone service in the same region. Contact us if you need a dedicated deployment.
  • Reuse connections. We recommend you reuse the same pc.Index() instance when you are upserting
    vectors into, and querying, the same index.
  • Avoid limits and known limitations.

Increasing throughput

Batch upserts

When upserting larger amounts of data, it is recommended to upsert records in large batches. This should be as large as possible (up to 1000 records) without exceeding the maximum request size of 2MB. To understand the number of records you can fit into one batch, see the Upsert limits section.

Send upserts in parallel

Pinecone is thread-safe, so you can send multiple read and write requests in parallel to help increase throughput. You can read more about high-throughput optimizations on our blog.

For serverless indexes, reads and writes follow independent paths, so you can can send multiple read and write requests in parallel to improve throughput.

For pod-based indexes, multiple reads can be performed in parallel, and multiple writes can be performed in parallel, but multiple reads and writes cannot be performed in parallel. Therefore, write batches may affect query latency, and read batches may affect write throughput.

Upsert a dataset from a dataframe

To quickly ingest data when using the Python SDK, use the upsert_from_dataframe method. The method includes retry logic andbatch_size, and is performant especially with Parquet file data sets.

The following example upserts the uora_all-MiniLM-L6-bm25 dataset as a dataframe.

Python
from pinecone import Pinecone, ServerlessSpec
from pinecone_datasets import list_datasets, load_dataset

pc = Pinecone(api_key="API_KEY")

dataset = load_dataset("quora_all-MiniLM-L6-bm25")

pc.create_index(
  name="example-index",
  dimension=384,
  metric="cosine",
  spec=ServerlessSpec(
    cloud="aws",
    region="us-east-1"
  )
)

# To get the unique host for an index, 
# see https://docs.pinecone.io/guides/data/target-an-index
index = pc.Index(host="INDEX_HOST")

index.upsert_from_dataframe(dataset.drop(columns=["blob"]))

Scale pod-based indexes

This guidance applies to pod-based indexes only. With serverless indexes, you don’t configure any compute or storage resources, and you don’t manually manage those resources to meet demand, save on cost, or ensure high availability. Instead, serverless indexes scale automatically based on usage.

To increase throughput (QPS) for pod-based indexes, increase the number of replicas for your index. See the configure_index API reference for more details.

Example

The following example increases the number of replicas for example-index to 4.

See the configure_index API reference for more details.

Decreasing latency

Use namespaces

When you use namespaces to partition records within a single index, you can limit queries to specific namespaces to reduces the number of records scanned. For more details, see Namespaces.

Use metadata filtering

When you attach metadata key-value pairs to records, you can filter queries to retrieve only records that match the metadata filter. For more details, see Metadata filtering.

For p2 pod-based indexes, metadata filters can increase query latency.

Avoid network calls to fetch index hosts

When you target an index by name for data operations such as upsert and query, the SDK gets the unique DNS host for the index using the describe_index operation. This is convenient for testing but should be avoided in production because describe_index uses a different API than data operations and therefore adds an additional network call and point of failure. Instead, you should get an index host once and cache it for reuse or specify the host directly.

You can get index hosts in the Pinecone console or using the describe_index operation. For more details, see Target an index host.

The following example shows how to target an index by host directly:

Was this page helpful?