Query data
This page shows you how to query records in dense and sparse indexes.
Depending on your data and your query, you may get fewer than top_k
results. This happens when top_k
is larger than the number of possible matches for your query.
Semantic search
Dense indexes store dense vectors, which are a series of numbers that represent the meaning and relationships of text, images, or other types of data. Each number in a dense vector corresponds to a point in a multidimensional space. Vectors that are closer together in that space are semantically similar.
When you query a dense index, Pinecone retrieves the dense vectors that are the most semantically similar to the query. This is often called semantic search, nearest neighbor search, similarity search, or just vector search.
Searching with text is supported only for indexes with integrated embedding.
To search a dense index with a query text, use the search_records
operation with the following parameters:
- The
namespace
to query. To use the default namespace, set the namespace to an empty string (""
). - The
input.text
parameter with the query text to convert to a query vector. - The
top_k
parameter with the number of similar records to return. - Optionally, you can specify the
fields
to return in the response. If not specified, the response will include all fields.
For example, the following code searches for the 2 records most semantically related to a query text:
The response will look as follows. Each record is returned with a similarity score that represents its distance to the query vector, calculated according to the similarity metric for the index.
Lexical search
This feature is in public preview.
Sparse indexes store sparse vectors, which are a series of numbers that represent the words or phrases in a document. Sparse vectors have a very large number of dimensions, where only a small proportion of values are non-zero. The dimensions represent words from a dictionary, and the values represent the importance of these words in the document.
When you search a sparse index, Pinecone retrieves the sparse vectors that most exactly match the words or phrases in the query. Query terms are scored independently and then summed, with the most similar records scored highest. This is often called lexical search or keyword search.
Searching with text is supported only for indexes with integrated embedding.
To search a sparse index with a query text, use the search_records
operation with the following parameters:
- The
namespace
to query. To use the default namespace, set the namespace to an empty string (""
). - The
input.text
parameter with the query text to convert to a sparse query vector. - The
top_k
parameter with the number of similar records to return. - Optionally, you can specify the
fields
to return in the response. If not specified, the response will include all fields.
For example, the following code converts the query “What is AAPL’s outlook, considering both product launches and market conditions?” to a sparse vector and then searches for the 3 most similar vectors in the example-namespaces
namespace:
The results will look as follows. The most similar records are scored highest.
Hybrid search
Semantic search and lexical search are powerful information retrieval techniques, but each has notable limitations. For example, semantic search can miss results based on exact keyword matches, especially in scenarios involving domain-specific terminology, while lexical search can miss results based on relationships, such as synonyms and paraphrases.
To lift these limitations, you can search both dense and sparse indexes, combine the results from both, and use one of Pinecone’s hosted reranking models to assign a unified relevance score, reorder the result accordingly, and return the most relevant matches. This is often called hybrid search or cascading retrieval.
Filter by metadata
When records include metadata fields, you can add a metadata filter
to limit the search to records matching a filter expression.
For example, the following code searches for the 3 records that are most semantically similar to a query vector and that have a category
metadata field with the value digestive system
:
Searching with text is supported only for indexes with integrated embedding.
Rerank results
You can increase the accuracy of your search by reranking initial results based on their relevance to the query.
To rerank initial results as an integrated part of a query, add the rerank
parameter, including the hosted reranking model you want to use, the number of reranked results to return, and the fields to use for reranking, if different than the main query.
For example, the following code searches for the 3 records most semantically related to a query text and uses the hosted bge-reranker-v2-m3
model to rerank the results and return only the 2 most relevant documents:
Notice that the 2 returned documents are the most relevant for the query, the first relating to reducing chronic diseases, the second relating to preventing diabetes:
Normalized between 0 and 1, the _score
represents the relevance of a document to the query, with scores closer to 1 indicating higher relevance.
Parallel queries
Python SDK v6.0.0 and later provide async
methods for use with asyncio. Async support makes it possible to use Pinecone with modern async web frameworks such as FastAPI, Quart, and Sanic, and can significantly increase the efficiency of running queries in parallel. For more details, see the Async requests.
Query across namespaces
Each query is limited to a single namespace. However, the Pinecone Python SDK provides a query_namespaces
utility method to run a query in parallel across multiple namespaces in an index and then merge the result sets into a single ranked result set with the top_k
most relevant results.
The query_namespaces
method accepts most of the same arguments as query
with the addition of a required namespaces
parameter.
When using the Python SDK without gRPC extras, to get good performance, it is important to set values for the pool_threads
and connection_pool_maxsize
properties on the index client. The pool_threads
setting is the number of threads available to execute requests, while connection_pool_maxsize
is the number of cached http connections that will be held. Since these tasks are not computationally heavy and are mainly i/o bound, it should be okay to have a high ratio of threads to cpus.
The combined results include the sum of all read unit usage used to perform the underlying queries for each namespace.
Query limits
Metric | Limit |
---|---|
Max top_k value | 10,000 |
Max result size | 4MB |
The query result size is affected by the dimension of the dense vectors and whether or not dense vector values and metadata are included in the result.
If a query fails due to exceeding the 4MB result size limit, choose a lower top_k
value, or use include_metadata=False
or include_values=False
to exclude metadata or values from the result.
Data freshness
Pinecone is eventually consistent, so there can be a slight delay before new or changed records are visible to queries. You can view index stats to check data freshness.
Was this page helpful?