This page describes helpful techniques for decreasing latency for upserts, searches, and other data operations.

Use namespaces

When you divide records into namespaces in a logical way, you speed up queries by ensuring only relevant records are scanned. The same applies to fetching records, listing record IDs, and other data operations.

Filter by metadata

In addition to increasing search accuracy and relevance, searching with metadata filters can also help decrease latency by retrieving only records that match the filter.

Target indexes by host

When you target an index by name for data operations such as upsert and query, the SDK gets the unique DNS host for the index using the describe_index operation. This is convenient for testing but should be avoided in production because describe_index uses a different API than data operations and therefore adds an additional network call and point of failure. Instead, you should get an index host once and cache it for reuse or specify the host directly.

You can get index hosts in the Pinecone console or using the describe_index operation.

The following example shows how to target an index by host directly:

When using Private Endpoints for private connectivity between your application and Pinecone, you must target the index using the Private Endpoint URL for the host.

from pinecone.grpc import PineconeGRPC as Pinecone

pc = Pinecone(api_key="YOUR_API_KEY")

index = pc.Index(host="INDEX_HOST")

Reuse connections

When you target an index for upserting or querying, the client establishes a TCP connection, which is a three-step process. To avoid going through this process on every request, and reduce average request latency, cache and reuse the index connection object whenever possible.

Use a cloud environment

If you experience slow uploads or high query latencies, it might be because you are accessing Pinecone from your home network. To decrease latency, access Pinecone/deploy your application from a cloud environment instead, ideally from the same cloud and region as your index.

Avoid database limits

Pinecone has rate limits that restrict the frequency of requests within a specified period of time. Rate limits vary based on pricing plan and apply to serverless indexes only.