Performance tuning
This section provides some tips for getting the best performance out of Pinecone.
Basic performance checklist
- Switch to a cloud environment. For example: EC2, GCE, Google Colab, GCP AI Platform Notebook, or SageMaker Notebook. If you experience slow uploads or high query latencies, it might be because you are accessing Pinecone from your home network.
- Deploy your application and your Pinecone service in the same region. Contact us if you need a dedicated deployment.
- Reuse connections. We recommend you reuse the same
pc.Index()
instance when you are upserting
vectors into, and querying, the same index. - Avoid limits and known limitations.
Increasing throughput
Batch upserts
When upserting larger amounts of data, it is recommended to upsert records in large batches. This should be as large as possible (up to 1000 records) without exceeding the maximum request size of 2MB. To understand the number of records you can fit into one batch, see the Upsert limits section.
Send upserts in parallel
Pinecone is thread-safe, so you can send multiple read and write requests in parallel to help increase throughput. You can read more about high-throughput optimizations on our blog.
For serverless indexes, reads and writes follow independent paths, so you can can send multiple read and write requests in parallel to improve throughput.
For pod-based indexes, multiple reads can be performed in parallel, and multiple writes can be performed in parallel, but multiple reads and writes cannot be performed in parallel. Therefore, write batches may affect query latency, and read batches may affect write throughput.
Upsert a dataset from a dataframe
To quickly ingest data when using the Python SDK, use the upsert_from_dataframe
method. The method includes retry logic andbatch_size
, and is performant especially with Parquet file data sets.
The following example upserts the uora_all-MiniLM-L6-bm25
dataset as a dataframe.
Scale pod-based indexes
This guidance applies to pod-based indexes only. With serverless indexes, you don’t configure any compute or storage resources, and you don’t manually manage those resources to meet demand, save on cost, or ensure high availability. Instead, serverless indexes scale automatically based on usage.
To increase throughput (QPS) for pod-based indexes, increase the number of replicas for your index. See the configure_index API reference for more details.
Example
The following example increases the number of replicas for example-index
to 4.
See the configure_index API reference for more details.
Decreasing latency
Use namespaces
When you use namespaces to partition records within a single index, you can limit queries to specific namespaces to reduces the number of records scanned. For more details, see Namespaces.
Use metadata filtering
When you attach metadata key-value pairs to records, you can filter queries to retrieve only records that match the metadata filter. For more details, see Metadata filtering.
For p2
pod-based indexes, metadata filters can increase query latency.
Avoid network calls to fetch index hosts
When you target an index by name for data operations such as upsert
and query
, the SDK gets the unique DNS host for the index using the describe_index
operation. This is convenient for testing but should be avoided in production because describe_index
uses a different API than data operations and therefore adds an additional network call and point of failure. Instead, you should get an index host once and cache it for reuse or specify the host directly.
You can get index hosts in the Pinecone console or using the describe_index
operation. For more details, see Target an index host.
The following example shows how to target an index by host directly:
Was this page helpful?