This page shows you how to query records in dense and sparse indexes.

Depending on your data and your query, you may get fewer than top_k results. This happens when top_k is larger than the number of possible matches for your query.

Dense indexes store dense vectors, which are a series of numbers that represent the meaning and relationships of text, images, or other types of data. Each number in a dense vector corresponds to a point in a multidimensional space. Vectors that are closer together in that space are semantically similar.

When you query a dense index, Pinecone retrieves the dense vectors that are the most semantically similar to the query. This is often called semantic search, nearest neighbor search, similarity search, or just vector search.

Searching with text is supported only for indexes with integrated embedding.

To search a dense index with a query text, use the search_records operation with the following parameters:

  • The namespace to query. To use the default namespace, set the namespace to an empty string ("").
  • The input.text parameter with the query text to convert to a query vector.
  • The top_k parameter with the number of similar records to return.
  • Optionally, you can specify the fields to return in the response. If not specified, the response will include all fields.

For example, the following code searches for the 2 records most semantically related to a query text:

from pinecone import Pinecone

pc = Pinecone(api_key="YOUR_API_KEY")

# To get the unique host for an index, 
# see https://docs.pinecone.io/guides/data/target-an-index
index = pc.Index(host="INDEX_HOST")

results = index.search_records(
    namespace="example-namespace", 
    query={
        "inputs": {"text": "Disease prevention"}, 
        "top_k": 2
    },
    fields=["category", "chunk_text"]
)

print(results)

The response will look as follows. Each record is returned with a similarity score that represents its distance to the query vector, calculated according to the similarity metric for the index.

{'result': {'hits': [{'_id': 'rec3',
                      '_score': 0.8204272389411926,
                      'fields': {'category': 'immune system',
                                 'chunk_text': 'Rich in vitamin C and other '
                                               'antioxidants, apples '
                                               'contribute to immune health '
                                               'and may reduce the risk of '
                                               'chronic diseases.'}},
                     {'_id': 'rec1',
                      '_score': 0.7931625843048096,
                      'fields': {'category': 'digestive system',
                                 'chunk_text': 'Apples are a great source of '
                                               'dietary fiber, which supports '
                                               'digestion and helps maintain a '
                                               'healthy gut.'}}]},
 'usage': {'embed_total_tokens': 8, 'read_units': 6}}

This feature is in public preview.

Sparse indexes store sparse vectors, which are a series of numbers that represent the words or phrases in a document. Sparse vectors have a very large number of dimensions, where only a small proportion of values are non-zero. The dimensions represent words from a dictionary, and the values represent the importance of these words in the document.

When you search a sparse index, Pinecone retrieves the sparse vectors that most exactly match the words or phrases in the query. Query terms are scored independently and then summed, with the most similar records scored highest. This is often called lexical search or keyword search.

Searching with text is supported only for indexes with integrated embedding.

To search a sparse index with a query text, use the search_records operation with the following parameters:

  • The namespace to query. To use the default namespace, set the namespace to an empty string ("").
  • The input.text parameter with the query text to convert to a sparse query vector.
  • The top_k parameter with the number of similar records to return.
  • Optionally, you can specify the fields to return in the response. If not specified, the response will include all fields.

For example, the following code converts the query “What is AAPL’s outlook, considering both product launches and market conditions?” to a sparse vector and then searches for the 3 most similar vectors in the example-namespaces namespace:

from pinecone import Pinecone

pc = Pinecone(api_key="YOUR_API_KEY")

# To get the unique host for an index, 
# see https://docs.pinecone.io/guides/data/target-an-index
index = pc.Index(host="INDEX_HOST")

results = index.search_records(
    namespace="example-namespace", 
    query={
        "inputs": {"text": "What is AAPL's outlook, considering both product launches and market conditions?"}, 
        "top_k": 3
    },
    fields=["chunk_text", "quarter"]
)

print(results)

The results will look as follows. The most similar records are scored highest.

{'result': {'hits': [{'_id': 'vec2',
                      '_score': 10.77734375,
                      'fields': {'chunk_text': "Analysts suggest that AAPL'''s "
                                               'upcoming Q4 product launch '
                                               'event might solidify its '
                                               'position in the premium '
                                               'smartphone market.',
                                 'quarter': 'Q4'}},
                     {'_id': 'vec3',
                      '_score': 6.49066162109375,
                      'fields': {'chunk_text': "AAPL'''s strategic Q3 "
                                               'partnerships with '
                                               'semiconductor suppliers could '
                                               'mitigate component risks and '
                                               'stabilize iPhone production.',
                                 'quarter': 'Q3'}},
                     {'_id': 'vec1',
                      '_score': 5.3671875,
                      'fields': {'chunk_text': 'AAPL reported a year-over-year '
                                               'revenue increase, expecting '
                                               'stronger Q3 demand for its '
                                               'flagship phones.',
                                 'quarter': 'Q3'}}]},
 'usage': {'embed_total_tokens': 18, 'read_units': 1}}

Semantic search and lexical search are powerful information retrieval techniques, but each has notable limitations. For example, semantic search can miss results based on exact keyword matches, especially in scenarios involving domain-specific terminology, while lexical search can miss results based on relationships, such as synonyms and paraphrases.

To lift these limitations, you can search both dense and sparse indexes, combine the results from both, and use one of Pinecone’s hosted reranking models to assign a unified relevance score, reorder the result accordingly, and return the most relevant matches. This is often called hybrid search or cascading retrieval.

from pinecone import Pinecone

pc = Pinecone(api_key="YOUR_API_KEY")

# To get the unique host for an index, 
# see https://docs.pinecone.io/guides/data/target-an-index
dense_index = pc.Index(host="INDEX_HOST")
sparse_index = pc.Index(host="INDEX_HOST")

# Define the query
query = "Q3 2024 us economic data"

# Search the dense index and rerank the results
dr = dense_index.search(
    namespace="example-namespace",
    query={
        "top_k": 20,
        "inputs": {
            "text": query
        }
    },
    rerank={
        "model": "cohere-rerank-3.5",
        "rank_fields": ["chunk_text"]
    }
)

# Search the sparse index and rerank the results 
sr = sparse_index.search(
    namespace=NAMESPACE,
    query={
        "top_k": 20,
        "inputs": {
            "text": query
        }
    },
    rerank={
        "model": "cohere-rerank-3.5",
        "rank_fields": ["chunk_text"]
    }
)

# Merge and deduplicate the results
def merge_chunks(h1, h2):
    """Get the unique hits from two search results and return them as single array"""
    h1_ids = [hit['_id'] for hit in h1['result']['hits']]
    h2_ids = [hit['_id'] for hit in h2['result']['hits']]
    deduped_hits = {hit['_id']: hit for hit in h1['result']['hits'] + h2['result']['hits']}.values()
    return sorted(deduped_hits, key=lambda x: x['_score'], reverse=True)

merged = merge_chunks(sr, dr)

# Print the results
print("Query", query)
print("-----")
for row in mrgc:
    print(f"{row['_id']} {round(row['_score'], 2)} - {row['fields']['chunk_text']}")

Filter by metadata

When records include metadata fields, you can add a metadata filter to limit the search to records matching a filter expression.

For example, the following code searches for the 3 records that are most semantically similar to a query vector and that have a category metadata field with the value digestive system:

Searching with text is supported only for indexes with integrated embedding.

from pinecone import Pinecone

pc = Pinecone(api_key="YOUR_API_KEY")

# To get the unique host for an index, 
# see https://docs.pinecone.io/guides/data/target-an-index
index = pc.Index(host="INDEX_HOST")

filtered_results = index.search_records(
    namespace="example-namespace", 
    query={
        "inputs": {"text": "Disease prevention"}, 
        "top_k": 3,
        "filter": {"category": "digestive system"},
    },
    fields=["category", "chunk_text"]
)

print(filtered_results)

Rerank results

You can increase the accuracy of your search by reranking initial results based on their relevance to the query.

To rerank initial results as an integrated part of a query, add the rerank parameter, including the hosted reranking model you want to use, the number of reranked results to return, and the fields to use for reranking, if different than the main query.

For example, the following code searches for the 3 records most semantically related to a query text and uses the hosted bge-reranker-v2-m3 model to rerank the results and return only the 2 most relevant documents:

from pinecone import Pinecone

pc = Pinecone(api_key="YOUR_API_KEY")

# To get the unique host for an index, 
# see https://docs.pinecone.io/guides/data/target-an-index
index = pc.Index(host="INDEX_HOST")

ranked_results = index.search_records(
    namespace="example-namespace", 
    query={
        "inputs": {"text": "Disease prevention"}, 
        "top_k": 4
    },
    rerank={
        "model": "bge-reranker-v2-m3",
        "top_n": 2,
        "rank_fields": ["chunk_text"]
    },
    fields=["category", "chunk_text"]
)

print(ranked_results)

Notice that the 2 returned documents are the most relevant for the query, the first relating to reducing chronic diseases, the second relating to preventing diabetes:

Normalized between 0 and 1, the _score represents the relevance of a document to the query, with scores closer to 1 indicating higher relevance.

{'result': {'hits': [{'_id': 'rec3',
                      '_score': 0.004399413242936134,
                      'fields': {'category': 'immune system',
                                 'chunk_text': 'Rich in vitamin C and other '
                                                'antioxidants, apples '
                                                'contribute to immune health '
                                                'and may reduce the risk of '
                                                'chronic diseases.'}},
                     {'_id': 'rec4',
                      '_score': 0.0029235430993139744,
                      'fields': {'category': 'endocrine system',
                                 'chunk_text': 'The high fiber content in '
                                                'apples can also help regulate '
                                                'blood sugar levels, making '
                                                'them a favorable snack for '
                                                'people with diabetes.'}}]},
 'usage': {'embed_total_tokens': 8, 'read_units': 6, 'rerank_units': 1}}

Parallel queries

Python SDK v6.0.0 and later provide async methods for use with asyncio. Async support makes it possible to use Pinecone with modern async web frameworks such as FastAPI, Quart, and Sanic, and can significantly increase the efficiency of running queries in parallel. For more details, see the Async requests.

Query across namespaces

Each query is limited to a single namespace. However, the Pinecone Python SDK provides a query_namespaces utility method to run a query in parallel across multiple namespaces in an index and then merge the result sets into a single ranked result set with the top_k most relevant results.

The query_namespaces method accepts most of the same arguments as query with the addition of a required namespaces parameter.

When using the Python SDK without gRPC extras, to get good performance, it is important to set values for the pool_threads and connection_pool_maxsize properties on the index client. The pool_threads setting is the number of threads available to execute requests, while connection_pool_maxsize is the number of cached http connections that will be held. Since these tasks are not computationally heavy and are mainly i/o bound, it should be okay to have a high ratio of threads to cpus.

The combined results include the sum of all read unit usage used to perform the underlying queries for each namespace.

Python
from pinecone import Pinecone

pc = Pinecone(api_key="YOUR_API_KEY")
index = pc.Index(
    name="example-index",
    pool_threads=50,             # <-- make sure to set these
    connection_pool_maxsize=50,  # <-- make sure to set these
)

query_vec = [ 0.1, ...] # an embedding vector with same dimension as the index
combined_results = index.query_namespaces(
    vector=query_vec,
    namespaces=['ns1', 'ns2', 'ns3', 'ns4'],
    metric="cosine",
    top_k=10,
    include_values=False,
    include_metadata=True,
    filter={"genre": { "$eq": "comedy" }},
    show_progress=False,
)

for scored_vec in combined_results.matches:
    print(scored_vec)
print(combined_results.usage)

Query limits

MetricLimit
Max top_k value10,000
Max result size4MB

The query result size is affected by the dimension of the dense vectors and whether or not dense vector values and metadata are included in the result.

If a query fails due to exceeding the 4MB result size limit, choose a lower top_k value, or use include_metadata=False or include_values=False to exclude metadata or values from the result.

Data freshness

Pinecone is eventually consistent, so there can be a slight delay before new or changed records are visible to queries. You can view index stats to check data freshness.

Was this page helpful?