Lexical search
This page shows you how to search a sparse index for records that most exactly match the words or phrases in a query. This is often called lexical search or keyword search.
Lexical search uses sparse vectors, which have a very large number of dimensions, where only a small proportion of values are non-zero. The dimensions represent words from a dictionary, and the values represent the importance of these words in the document. Words are scored independently and then summed, with the most similar records scored highest.
This feature is in public preview.
Search with text
Searching with text is supported only for indexes with integrated embedding.
To search a sparse index with a query text, use the search_records
operation with the following parameters:
- The
namespace
to query. To use the default namespace, set the namespace to an empty string (""
). - The
query.inputs.text
parameter with the query text. Pinecone uses the embedding model integrated with the index to convert the text to a sparse vector automatically. - The
query.top_k
parameter with the number of similar records to return. - Optionally, you can specify the
fields
to return in the response. If not specified, the response will include all fields.
For example, the following code converts the query “What is AAPL’s outlook, considering both product launches and market conditions?” to a sparse vector and then searches for the 3 most similar vectors in the example-namespaces
namespace:
The results will look as follows. The most similar records are scored highest.
Search with a sparse vector
To search a sparse index with a sparse vector representation of a query, use the query
operation with the following parameters:
- The
namespace
to query. To use the default namespace, set the namespace to an empty string (""
). - The
sparse_vector
parameter with the sparse vector values and indices. - The
top_k
parameter with the number of results to return. - Optionally, you can set
include_values
and/orinclude_metadata
totrue
to include the vector values and/or metadata of the matching records in the response. However, when querying withtop_k
over 1000, avoid returning vector data or metadata for optimal performance.
For example, the following code uses a sparse vector representation of the query “What is AAPL’s outlook, considering both product launches and market conditions?” to search for the 3 most similar vectors in the example-namespaces
namespace:
The results will look as follows. The most similar records are scored highest.
Search with a record ID
When you search with a record ID, Pinecone uses the sparse vector associated with the record as the query. To search a sparse index with a record ID, use the query
operation with the following parameters:
- The
namespace
to query. To use the default namespace, set the namespace to an empty string (""
). - The
id
parameter with the unique record ID containing the sparse vector to use as the query. - The
top_k
parameter with the number of results to return. - Optionally, you can set
include_values
and/orinclude_metadata
totrue
to include the vector values and/or metadata of the matching records in the response. However, when querying withtop_k
over 1000, avoid returning vector data or metadata for optimal performance.
For example, the following code uses an ID to search for the 3 records in the example-namespace
namespace that best match the sparse vector in the record: