Query data
After your data is indexed, you can start sending queries to Pinecone.
The query
endpoint searches the index using a query vector. It retrieves the IDs of the most similar records in the index, along with their similarity scores. This endpoint can optionally return the result’s vector values and metadata, too. You specify the number of vectors to retrieve each time you send a query. Matches are always ordered by similarity from most similar to least similar.
The similarity score for a vector represents its distance to the query vector, calculated according to the distance metric for the index. The significance of the score depends on the similarity metric. For example, for indexes using the euclidean
distance metric, scores with lower values are more similar, while for indexes using the dotproduct
metric, higher scores are more similar.
Pinecone is eventually consistent, so there can be a slight delay before new or changed records are visible to queries. See Understanding data freshness to learn about data freshness in Pinecone and how to check the freshness of your data.
Query limits
Metric | Limit |
---|---|
Max top_k value | 10,000 |
Max result size | 4MB |
The query result size is affected by the dimension of the dense vectors and whether or not dense vector values and metadata are included in the result.
If a query fails due to exceeding the 4MB result size limit, choose a lower top_k
value, or use include_metadata=False
or include_values=False
to exclude metadata or values from the result.
Send a query
Each query must include a query vector, specified by either a vector
or id
, and the number of results to return, specified by the top_k
parameter. Each query is also limited to a single namespace within an index. To target a namespace, pass the namespace parameter. To query the default namespace, pass ""
or omit the namespace parameter.
Depending on your data and your query, you may get fewer than top_k
results. This happens when top_k
is larger than the number of possible matching vectors for your query.
For optimal performance when querying with top_k
over 1000, avoid returning vector data (include_values=True
) or metadata (include_metadata=True
).
Query by vector
To query by dense vector, provide the vector
values representing your query embedding and the topK
parameter.
The following example sends a query vector with vector
values and retrieves three matching vectors:
The response looks like this:
Query by record ID
To query by record ID, provide the unique record ID and the topK
parameter.
The following example sends a query vector with an id
value and retrieves three matching vectors:
For more information, see Limitations of querying by ID.
Query with metadata filters
Metadata filter expressions can be included with queries to limit the search to only vectors matching the filter expression.
top_k
over 1000, avoid returning vector data (include_values=True
) or metadata (include_metadata=True
).Use the filter
parameter to specify the metadata filter expression. For example, to search for a movie in the “documentary” genre:
For more information about filtering with metadata, see Understanding metadata.
Query with sparse and dense vectors
When querying an index containing sparse and dense vectors, include a sparse_vector
in your query parameters.
Only indexes using the dotproduct metric support querying sparse vectors.
This feature is in public preview.
Examples
The following example shows how to query with a sparse-dense vector.
To learn more, see Querying sparse-dense vectors.
Query across multiple namespaces
Each query is limited to a single namespace. However, the Pinecone Python SDK provides a query_namespaces
utility method to run a query in parallel across multiple namespaces in an index and then merge the result sets into a single ranked result set with the top_k
most relevant results.
The query_namespaces
method accepts most of the same arguments as query
with the addition of a required namespaces
parameter.
Python SDK without gRPC
When using the Python SDK without gRPC extras, to get good performance, it is important to set values for the pool_threads
and connection_pool_maxsize
properties on the index client. The pool_threads
setting is the number of threads available to execute requests, while connection_pool_maxsize
is the number of cached http connections that will be held. Since these tasks are not computationally heavy and are mainly i/o bound, it should be okay to have a high ratio of threads to cpus.
The combined results include the sum of all read unit usage used to perform the underlying queries for each namespace.
Python SDK with gRPC
When using the Python SDK with gRPC extras, there is no need to set the connection_pool_maxsize
because grpc makes efficient use of open connections by default.
Query with integrated embedding and reranking
To automatically embed queries and rerank results as part of the search process, use integrated inference.
Data freshness
Pinecone is eventually consistent, so there can be a slight delay before new or changed records are visible to queries. You can use the describe_index_stats
endpoint to check data freshness.
Was this page helpful?