This section provides some tips for getting the best performance out of Pinecone.

Basic performance checklist

  • Switch to a cloud environment. For example: EC2, GCE, Google Colab, GCP AI Platform Notebook, or SageMaker Notebook. If you experience slow uploads or high query latencies, it might be because you are accessing Pinecone from your home network.
  • Deploy your application and your Pinecone service in the same region. Contact us if you need a dedicated deployment.
  • Reuse connections. We recommend you reuse the same pinecone.Index() instance when you are upserting and querying the same index.
  • Operate within known limits.

How to increase throughput

To increase throughput (QPS), increase the number of replicas for your index.


The following example increases the number of replicas for example-index to 4.

pinecone.configure_index("example-index", replicas=4)

See the configure_index API reference for more details.

Using the gRPC client to get higher upsert speeds

Pinecone has a gRPC flavor of the standard client (installation) that can provide higher upsert speeds for multi-pod indexes.

To connect to an index via the gRPC client:

index = pinecone.GRPCIndex("index-name")

The syntax for upsert, query, fetch, and delete with the gRPC client remain the same as the standard client.

We recommend you use parallel upserts to get the best performance.

index = pinecone.GRPCIndex('example-index')
def chunker(seq, batch_size):
  return (seq[pos:pos + batch_size] for pos in range(0, len(seq), batch_size))
async_results = [
        index.upsert(vectors=chunk, async_req=True)
        for chunk in chunker(data, batch_size=100)
# Wait for and retrieve responses (in case of error)
[async_result.result() for async_result in async_results]

We recommend you use the gRPC client for multi-pod indexes only. The performance of the standard and gRPC clients are similar in a single-pod index.

It’s possible to get write throttled faster when upserting using the gRPC index. If you see this often, we recommend you use a backoff algorithm while upserting.

Pinecone is thread-safe, so you can launch multiple read requests and multiple write requests in parallel. Launching multiple requests can help with improving your throughput. However, reads and writes can’t be performed in parallel, therefore writing in large batches might affect query latency and vice versa.