Query sparse-dense vectors
This page shows you how to query your sparse-dense vectors (hybrid search) and explains how Pinecone ranks hybrid search results.
Only indexes using the dotproduct distance metric support querying sparse-dense vectors.
This feature is in public preview.
Query records with sparse-dense values
To query records with sparse-dense values, use the query
operation, specifying a value for sparse_vector
, which is an object containing the key-value pairs indices
and values
.
The following example queries an index using a sparse-dense vector:
The value of query_response
is like the following:
{'matches': [{'id': 'vec5', 'score': 0.44, 'values': []},
{'id': 'vec1', 'score': 0.32, 'values': []},
{'id': 'vec2', 'score': 0.26000002, 'values': []},
{'id': 'vec3', 'score': 0.200000018, 'values': []},
{'id': 'vec4', 'score': 0.140000015, 'values': []}]
}
Query a sparse-dense index with explicit weighting
Because Pinecone views your sparse-dense vector as a single vector, it does not offer a built-in parameter to adjust the weight of a query’s dense part against its sparse part; the index is agnostic to density or sparsity of coordinates in your vectors. You may, however, incorporate a linear weighting scheme by customizing your query vector, as we demonstrate in the function below.
Examples
The following example transforms vector values using an alpha parameter.
def hybrid_score_norm(dense, sparse, alpha: float):
"""Hybrid score using a convex combination
alpha * dense + (1 - alpha) * sparse
Args:
dense: Array of floats representing
sparse: a dict of `indices` and `values`
alpha: scale between 0 and 1
"""
if alpha < 0 or alpha > 1:
raise ValueError("Alpha must be between 0 and 1")
hs = {
'indices': sparse['indices'],
'values': [v * (1 - alpha) for v in sparse['values']]
}
return [v * alpha for v in dense], hs
The following example transforms a vector using the above function, then queries a Pinecone index.
sparse_vector = {
'indices': [10, 45, 16],
'values': [0.5, 0.5, 0.2]
}
dense_vector = [0.1, 0.2, 0.3]
hdense, hsparse = hybrid_score_norm(dense_vector, sparse_vector, alpha=0.75)
query_response = index.query(
namespace="example-namespace",
top_k=10,
vector=hdense,
sparse_vector=hsparse
)
See also
Was this page helpful?