There are many aspects to consider to minimize latencies:

Slow uploads or high latencies

To minimize latency when accessing Pinecone:

  • Switch to a cloud environment. For example: EC2, GCE, Google Colab, GCP AI Platform Notebook, or SageMaker Notebook. If you experience slow uploads or high query latencies, it might be because you are accessing Pinecone from your home network.
  • Consider deploying your application in the same environment as your Pinecone service.
  • See performance tuning for more tips.

High query latencies with batching

If you’re batching queries, try reducing the number of queries per call to 1 query vector. You can make these calls in parallel and expect roughly the same performance as with batching.