Understanding hybrid indexes
This page focuses on hybrid indexes that store records containing both dense and sparse vectors. However, for serverless indexes, Pinecone recommends using separate dense-only and sparse-only indexes for more flexible and accurate hybrid search.
Pinecone supports vectors with sparse and dense values, which allows you to perform hybrid search on a single Pinecone index. Hybrid search combines semantic and keyword search in one query for more relevant results. Semantic search results for out-of-domain queries can be less relevant; combining these with keyword search results can improve relevance. This topic describes how hybrid search with sparse-dense vectors works in Pinecone.
This feature is in public preview.
Hybrid index search in Pinecone
In Pinecone, you perform hybrid search with sparse-dense vectors. Sparse-dense vectors combine dense and sparse embeddings as a single vector. Sparse and dense vectors represent different types of information and enable distinct kinds of search.
Dense vectors
The basic vector type in Pinecone is a dense vector. Dense vectors enable semantic search. Semantic search returns the most similar results according to a specific distance metric even if no exact matches are present. This is possible because dense vectors generated by embedding models such as multilingual-e5-large
are numerical representations of semantic meaning.
Sparse vectors
Sparse vectors have very large number of dimensions, where only a small proportion of values are non-zero. When used for keywords search, each sparse vector represents a document; the dimensions represent words from a dictionary, and the values represent the importance of these words in the document. Keyword search algorithms compute the relevance of text documents based on the number of keyword matches, their frequency, and other factors.
Sparse-dense workflow
Using hybrid indexes with both sparse and dense vectors involves the following general steps:
- Create dense vectors using a dense embedding model.
- Create sparse vectors using a sparse embedding model.
- Create a hybrid index with the
dotproduct
metric. - Upsert sparse-dense vectors to your index.
- Search the index using sparse-dense vectors.
- Pinecone returns sparse-dense vectors.
Limitations
Pinecone sparse-dense vectors have the following limitations:
-
Records with sparse vector values must also contain dense vector values.
-
Sparse vector values can contain up to 1000 non-zero values and 4.2 billion dimensions.
-
Only indexes using the dotproduct distance metric support querying sparse-dense vectors.
Upserting, updating, and fetching sparse-dense vectors in indexes with a different distance metric will succeed, but querying will return an error.
-
Indexes created before February 22, 2023 do not support sparse vectors.
See also
Was this page helpful?