This page shows you how to search a sparse index for records that most exactly match the words or phrases in a query. This is often called lexical search or keyword search. Lexical search uses sparse vectors, which have a very large number of dimensions, where only a small proportion of values are non-zero. The dimensions represent words from a dictionary, and the values represent the importance of these words in the document. Words are scored independently and then summed, with the most similar records scored highest.

Search with text

Searching with text is supported only for indexes with integrated embedding.
To search a sparse index with a query text, use the search_records operation with the following parameters:
  • namespace: The namespace to query. To use the default namespace, set to "__default__".
  • query.inputs.text: The query text. Pinecone uses the embedding model integrated with the index to convert the text to a sparse vector automatically.
  • query.top_k: The number of records to return.
  • query.match_terms: (Optional) A list of terms that must be present in each search result. For more details, see Filter by required terms.
  • fields: (Optional) The fields to return in the response. If not specified, the response includes all fields.
For example, the following code converts the query “What is AAPL’s outlook, considering both product launches and market conditions?” to a sparse vector and then searches for the 3 most similar vectors in the example-namespace namespace:
from pinecone import Pinecone

pc = Pinecone(api_key="YOUR_API_KEY")

# To get the unique host for an index, 
# see https://docs.pinecone.io/guides/manage-data/target-an-index
index = pc.Index(host="INDEX_HOST")

results = index.search(
    namespace="example-namespace", 
    query={
        "inputs": {"text": "What is AAPL's outlook, considering both product launches and market conditions?"}, 
        "top_k": 3
    },
    fields=["chunk_text", "quarter"]
)

print(results)
The results will look as follows. The most similar records are scored highest.
{'result': {'hits': [{'_id': 'vec2',
                      '_score': 10.77734375,
                      'fields': {'chunk_text': "Analysts suggest that AAPL'''s "
                                               'upcoming Q4 product launch '
                                               'event might solidify its '
                                               'position in the premium '
                                               'smartphone market.',
                                 'quarter': 'Q4'}},
                     {'_id': 'vec3',
                      '_score': 6.49066162109375,
                      'fields': {'chunk_text': "AAPL'''s strategic Q3 "
                                               'partnerships with '
                                               'semiconductor suppliers could '
                                               'mitigate component risks and '
                                               'stabilize iPhone production.',
                                 'quarter': 'Q3'}},
                     {'_id': 'vec1',
                      '_score': 5.3671875,
                      'fields': {'chunk_text': 'AAPL reported a year-over-year '
                                               'revenue increase, expecting '
                                               'stronger Q3 demand for its '
                                               'flagship phones.',
                                 'quarter': 'Q3'}}]},
 'usage': {'embed_total_tokens': 18, 'read_units': 1}}

Search with a sparse vector

To search a sparse index with a sparse vector representation of a query, use the query operation with the following parameters:
  • namespace: The namespace to query. To use the default namespace, set to "__default__".
  • sparse_vector: The sparse vector values and indices.
  • top_k: The number of results to return.
  • include_values: Whether to include the vector values of the matching records in the response. Defaults to false.
  • include_metadata: Whether to include the metadata of the matching records in the response. Defaults to false.
    When querying with top_k over 1000, avoid returning vector data or metadata for optimal performance.
For example, the following code uses a sparse vector representation of the query “What is AAPL’s outlook, considering both product launches and market conditions?” to search for the 3 most similar vectors in the example-namespace namespace:
from pinecone import Pinecone

pc = Pinecone(api_key="YOUR_API_KEY")

# To get the unique host for an index, 
# see https://docs.pinecone.io/guides/manage-data/target-an-index
index = pc.Index(host="INDEX_HOST")

results = index.query(
    namespace="example-namespace",
    sparse_vector={
      "values": [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
      "indices": [767227209, 1640781426, 1690623792, 2021799277, 2152645940, 2295025838, 2443437770, 2779594451, 2956155693, 3476647774, 3818127854, 4283091697]
    }, 
    top_k=3,
    include_metadata=True,
    include_values=False
)

print(results)
The results will look as follows. The most similar records are scored highest.
{'matches': [{'id': 'vec2',
              'metadata': {'category': 'technology',
                           'quarter': 'Q4',
                           'chunk_text': "Analysts suggest that AAPL'''s "
                                          'upcoming Q4 product launch event '
                                          'might solidify its position in the '
                                          'premium smartphone market.'},
              'score': 10.9042969,
              'values': []},
             {'id': 'vec3',
              'metadata': {'category': 'technology',
                           'quarter': 'Q3',
                           'chunk_text': "AAPL'''s strategic Q3 partnerships "
                                          'with semiconductor suppliers could '
                                          'mitigate component risks and '
                                          'stabilize iPhone production'},
              'score': 6.48010254,
              'values': []},
             {'id': 'vec1',
              'metadata': {'category': 'technology',
                           'quarter': 'Q3',
                           'chunk_text': 'AAPL reported a year-over-year '
                                          'revenue increase, expecting '
                                          'stronger Q3 demand for its flagship '
                                          'phones.'},
              'score': 5.3671875,
              'values': []}],
 'namespace': 'example-namespace',
 'usage': {'read_units': 1}}

Search with a record ID

When you search with a record ID, Pinecone uses the sparse vector associated with the record as the query. To search a sparse index with a record ID, use the query operation with the following parameters:
  • namespace: The namespace to query. To use the default namespace, set to "__default__".
  • id: The unique record ID containing the sparse vector to use as the query.
  • top_k: The number of results to return.
  • include_values: Whether to include the vector values of the matching records in the response. Defaults to false.
  • include_metadata: Whether to include the metadata of the matching records in the response. Defaults to false.
    When querying with top_k over 1000, avoid returning vector data or metadata for optimal performance.
For example, the following code uses an ID to search for the 3 records in the example-namespace namespace that best match the sparse vector in the record:
from pinecone.grpc import PineconeGRPC as Pinecone

pc = Pinecone(api_key="YOUR_API_KEY")

# To get the unique host for an index, 
# see https://docs.pinecone.io/guides/manage-data/target-an-index
index = pc.Index(host="INDEX_HOST")

index.query(
    namespace="example-namespace",
    id="rec2", 
    top_k=3,
    include_metadata=True,
    include_values=False
)

Filter by required terms

This feature is in public preview and is available only on the 2025-10 version of the API. See limitations for details.
When searching with text, you can specify a list of terms that must be present in each lexical search result. This is especially useful for:
  • Precision filtering: Ensuring specific entities or concepts appear in results
  • Quality control: Filtering out results that don’t contain essential keywords
  • Domain-specific searches: Requiring domain-specific terminology in results
  • Entity-based filtering: Ensuring specific people, places, or things are mentioned
To filter by required terms, add match_terms to your query, specifying the terms to require and the strategy to use. Currently, all is the only strategy supported (all terms must be present). For example, the following request searches for records about Tesla’s stock performance while ensuring both “Tesla” and “stock” appear in each result:
curl
PINECONE_API_KEY="YOUR_API_KEY"
INDEX_HOST="INDEX_HOST"

curl "https://$INDEX_HOST/records/namespaces/example-namespace/search" \
  -H "Content-Type: application/json" \
  -H "Api-Key: $PINECONE_API_KEY" \
  -H "X-Pinecone-API-Version: unstable" \
  -d '{
        "query": {
          "inputs": { "text": "What is the current outlook for Tesla stock performance?" },
          "top_k": 3,
          "match_terms": {
            "terms": ["Tesla", "stock"],
            "strategy": "all"
          }
        },
        "fields": ["chunk_text"]
    }'
The response includes only records that contain both “Tesla” and “stock”:
{
  "result": {
    "hits": [
      {
        "_id": "tesla_q4_earnings",
        "_score": 9.82421875,
        "fields": {
          "chunk_text": "Tesla stock surged 8% in after-hours trading following strong Q4 earnings that exceeded analyst expectations. The company reported record vehicle deliveries and improved profit margins."
        }
      },
      {
        "_id": "tesla_competition_analysis",
        "_score": 7.49066162109375,
        "fields": {
          "chunk_text": "Tesla stock faces increasing competition from traditional automakers entering the electric vehicle market. However, analysts maintain that Tesla's technological lead and brand recognition provide significant advantages."
        }
      },
      {
        "_id": "tesla_production_update",
        "_score": 6.3671875,
        "fields": {
          "chunk_text": "Tesla stock performance is closely tied to production capacity at its Gigafactories. Recent expansion announcements suggest the company is positioning for continued growth in global markets."
        }
      }
    ]
  },
  "usage": {
    "embed_total_tokens": 18,
    "read_units": 1
  }
}
Without the match_terms filter, you might get results like:
  • “Tesla cars are popular in California” (mentions Tesla but not stock)
  • “Stock market volatility affects tech companies” (mentions stock but not Tesla)
  • “Electric vehicle sales are growing” (neither Tesla nor stock)

Limitations

  • Integrated indexes only: Filtering by required terms is supported only for indexes with integrated embedding.
  • Post-processing filter: The filtering happens after the initial query, so potential matches that weren’t included in the initial top_k results won’t appear in the final results
  • No phrase matching: Terms are matched individually in any order and location.
  • No case-sensitivity: Terms are normalized during processing.