Upsert sparse-dense vectors
Pinecone supports vectors with sparse and dense values, which allows you to perform hybrid search, or semantic and keyword search, in one query and combine the results for more relevant results. This page explains the sparse-dense vector format and how to upsert sparse-dense vectors into Pinecone indexes.
To see sparse-dense embeddings in action, see the Ecommerce hybrid search example.
This feature is in public preview.
Sparse-dense vector format
Pinecone represents sparse values as a dictionary of two arrays: indices
and values
. The elements of indices
have type uint32
; the elements of values
have type float32
.
Example
The following example defines two records with sparse and dense values.
from pinecone.grpc import PineconeGRPC as Pinecone
pc = Pinecone(api_key='API_KEY')
index = pc.Index('example-index')
records=[
{'id': 'vec1',
# The 'values' are dense vector values.
'values': [0.1, 0.2, 0.3],
'metadata': {'genre': 'drama'},
'sparse_values': {
'indices': [10, 45, 16],
'values': [0.5, 0.5, 0.2]
}
},
{'id': 'vec2',
'values': [0.2, 0.3, 0.4],
'metadata': {'genre': 'action'},
'sparse_values': {
# Indices have type uint32
'indices': [15, 40, 11],
# Values have type float32
'values': [0.4, 0.5, 0.2]
}
}
]
Pinecone supports sparse vectors of up to 1000 non-zero values and 4.2 billion dimensions.
Assuming a dense vector component with 768 dimensions, Pinecone supports roughly 2.8M sparse vectors per s1
pod or 900k per p1
pod.
Upsert records with sparse-dense values
To upsert records with sparse-dense values, use the upsert
operation, specifying dense values in the value
parameter and sparse values in the sparse_values
parameter.
Only indexes using the dotproduct distance metric support sparse-dense vectors. Upserting sparse-dense vectors into indexes with a different distance metric will succeed, but querying will return an error.
Next steps
Was this page helpful?