This page shows you how to query records in an index namespace. Querying retrieves the IDs of the most similar records in the namespace, along with their similarity scores. You specify the number of records to retrieve each time you send a query. Matches are always ordered by similarity from most similar to least similar.

The similarity score for a vector represents its distance to the query vector, calculated according to the similarity metric for the index. For example, for indexes using the euclidean distance metric, scores with lower values are more similar, while for indexes using the dotproduct metric, higher scores are more similar.

Depending on your data and your query, you may get fewer than top_k results. This happens when top_k is larger than the number of possible matching vectors for your query.

Pinecone is eventually consistent, so there can be a slight delay before new or changed records are visible to queries. See Understanding data freshness to learn about data freshness in Pinecone and how to check the freshness of your data.

Query limits

MetricLimit
Max top_k value10,000
Max result size4MB

The query result size is affected by the dimension of the dense vectors and whether or not dense vector values and metadata are included in the result.

If a query fails due to exceeding the 4MB result size limit, choose a lower top_k value, or use include_metadata=False or include_values=False to exclude metadata or values from the result.

Query with a vector

To search a namespace with a query vector, use the query operation with the following parameters:

  • The namespace to query. To use the default namespace, set the namespace to an empty string ("").
  • The vector parameter with the dense vector values representing your query.
  • The top_k parameter with the number of results to return.

Optionally, you can specify:

  • A filter parameter with a metadata filter expressions. This limits the search to records matching the filter expression.
  • A sparse_vector parameter with sparse vector values. This allows you to query with both dense and sparse vectors to perform semantic and keyword search in one query for more relevant results.
  • An include_values parameter to include or exclude the vector values of the matching records in the response.
  • An include_metadata parameter to include or exclude the metadata of the matching records in the response.

For optimal performance when querying with top_k over 1000, avoid returning vector data (include_values=True) or metadata (include_metadata=True).

For example, the following code searches for the 3 records that are most semantically similar to a query vector:

The response looks like this:

Query with a record ID

To search a namespace with a record ID, use the query operation with the following parameters:

  • The namespace to query. To use the default namespace, set the namespace to an empty string ("").
  • The id parameter with unique record ID containing the vector to use as the query.
  • The top_k parameter with the number of results to return.

Optionally, you can specify:

  • A filter parameter with a metadata filter expressions. This limits the search to records matching the filter expression.
  • A sparse_vector parameter with sparse vector values. This allows you to query with both dense and sparse vectors to perform semantic and keyword search in one query for more relevant results.
  • An include_values parameter to include or exclude the vector values of the matching records in the response.
  • An include_metadata parameter to include or exclude the metadata of the matching records in the response.

For example, the following code sends a record id to search for the 3 records that are most semantically similar:

For more information, see Limitations of querying by ID.

Query with text

Querying by text is supported only for indexes with integrated embedding.

To search a namespace with a query text, use the search_records operation with the following parameters:

  • The namespace to query. To use the default namespace, set the namespace to an empty string ("").
  • The input.text parameter with the query text to convert to a query vector.
  • The top_k parameter with the number of similar records to return.

Optionally, you can specify:

  • The fields to return. If not specified, the response will include all fields.
  • A filter parameter with a metadata filter expressions. This limits the search to records matching the filter expression and can increase the accuracy of results.
  • rerank parameters to rerank the initial search results based on relevance to the query. This can increase the accuracy of results.

For example, the follow code searches for the 3 records most semantically related to the query, “Disease prevention” and then uses the hosted bge-reranker-v2-m3 model to rerank the results and return only the 2 most relevant records:

The 2 returned records are the most relevant for the query, the first relating to reducing chronic diseases, the second relating to preventing diabetes:

Query with metadata filters

Metadata filters limit the search to records matching a filter expression.

For optimal performance, when querying pod-based indexes with top_k over 1000, avoid returning vector data (include_values=True) or metadata (include_metadata=True).

Use the filter parameter to specify the metadata filter expression. For example, to search for a movie in the “documentary” genre:

For more information about filtering with metadata, see Understanding metadata.

Query with sparse and dense vectors

When querying an index containing sparse and dense vectors, include a sparse_vector in your query parameters.

Only indexes using the dotproduct metric support querying sparse vectors.

This feature is in public preview.

Examples

The following example shows how to query with a sparse-dense vector.

To learn more, see Querying sparse-dense vectors.

Query across multiple namespaces

Each query is limited to a single namespace. However, the Pinecone Python SDK provides a query_namespaces utility method to run a query in parallel across multiple namespaces in an index and then merge the result sets into a single ranked result set with the top_k most relevant results.

The query_namespaces method accepts most of the same arguments as query with the addition of a required namespaces parameter.

Python SDK without gRPC

When using the Python SDK without gRPC extras, to get good performance, it is important to set values for the pool_threads and connection_pool_maxsize properties on the index client. The pool_threads setting is the number of threads available to execute requests, while connection_pool_maxsize is the number of cached http connections that will be held. Since these tasks are not computationally heavy and are mainly i/o bound, it should be okay to have a high ratio of threads to cpus.

The combined results include the sum of all read unit usage used to perform the underlying queries for each namespace.

Python
from pinecone import Pinecone

pc = Pinecone(api_key="YOUR_API_KEY")
index = pc.Index(
    name="example-index",
    pool_threads=50,             # <-- make sure to set these
    connection_pool_maxsize=50,  # <-- make sure to set these
)

query_vec = [ 0.1, ...] # an embedding vector with same dimension as the index
combined_results = index.query_namespaces(
    vector=query_vec,
    namespaces=['ns1', 'ns2', 'ns3', 'ns4'],
    metric="cosine",
    top_k=10,
    include_values=False,
    include_metadata=True,
    filter={"genre": { "$eq": "comedy" }},
    show_progress=False,
)

for scored_vec in combined_results.matches:
    print(scored_vec)
print(combined_results.usage)

Python SDK with gRPC

When using the Python SDK with gRPC extras, there is no need to set the connection_pool_maxsize because grpc makes efficient use of open connections by default.

Python
from pinecone.grpc import PineconeGRPC

pc = PineconeGRPC(api_key="API_KEHY")
index = pc.Index(
    name="example-index",
    pool_threads=50, # <-- make sure to set this
)

query_vec = [ 0.1, ...] # an embedding vector with same dimension as the index
combined_results = index.query_namespaces(
    vector=query_vec,
    namespaces=['ns1', 'ns2', 'ns3', 'ns4'],
    metric="cosine",
    top_k=10,
    include_values=False,
    include_metadata=True,
    filter={"genre": { "$eq": "comedy" }},
    show_progress=False,
)

for scored_vec in combined_results.matches:
    print(scored_vec)
print(combined_results.usage)

Parallel queries

Python SDK v6.0.0 and later provide async methods for use with asyncio. Async support makes it possible to use Pinecone with modern async web frameworks such as FastAPI, Quart, and Sanic, and can significally increase the efficiency of running queries in parallel. For more details, see the Python SDK documentation.

Data freshness

Pinecone is eventually consistent, so there can be a slight delay before new or changed records are visible to queries. You can use the describe_index_stats endpoint to check data freshness.

Was this page helpful?