Skip to main content
There are many aspects to consider to minimize latencies:

Slow uploads or high latencies

To minimize latency when accessing Pinecone:
  • Switch to a cloud environment. For example: EC2, GCE, Google Colab, GCP AI Platform Notebook, or SageMaker Notebook. If you experience slow uploads or high query latencies, it might be because you are accessing Pinecone from your home network.
  • Consider deploying your application in the same environment as your Pinecone service.
  • See Decrease latency for more tips.

High query latencies with batching

If you’re batching queries, try reducing the number of queries per call to 1 query vector. You can make these calls in parallel and expect roughly the same performance as with batching.

High latencies with fetch or include_values

For on-demand indexes, since vector values are retrieved from object storage, operations that return vector values (fetch operations or queries with include_values=true) may have increased latency. If you don’t need the vector values, set include_values=false when querying, or use the query operation instead of fetch if you only need metadata or IDs. See Decrease latency for more details.