Hybrid Search



This is an early access feature, which is available on an invitation basis and is not intended for production workloads. No SLAs or technical support commitments are provided. Sign up for early access.


Pinecone supports hybrid search, which allows you to perform semantic and keyword search over your data in one query and combine the results for more relevant results. This topic describes what hybrid search does, why it is useful, and how it works in Pinecone.

Pinecone hybrid search allows keyword-aware semantic search

Pinecone hybrid search allows you to perform keyword-aware semantic search. Semantic
search results for out-of-domain queries can be less relevant; combining these
with keyword search results can improve relevance

Because Pinecone allows you to create your own sparse vectors, you can use hybrid search to solve the Maximum Inner Product Search (MIPS) problem for hybrid vectors of any real values. This includes emerging use-cases such as retrieval over learnt sparse representations for text data using SPLADE.

Hybrid search workflow

Hybrid search involves the following general steps:

  1. Create dense vectors using an external embedding model.
  2. Create sparse vectors using an external tokenizer.
  3. Create a hybrid index.
  4. Upsert dense and sparse vectors to the hybrid upsert endpoint.
  5. Search the hybrid index using the hybrid query endpoint.
  6. Pinecone returns ranked hybrid vectors.

Figure 1 below illustrates these steps.

Figure 1: Hybrid search workflow

Hybrid search workflowHybrid search workflow

Sparse versus dense vectors in Pinecone

Hybrid search combines dense and sparse vectors; these types of vectors represent different types of information and enable distinct kinds of search. Dense vectors enable semantic search. Semantic search returns the most similar results according to a specific distance metric even if no exact matches are present. This is possible because dense vectors generated by embedding models such as SBERT are numerical representations of semantic meaning.

Sparse vectors have very large number of dimensions, generally over 1M, where only a small proportion of values are non-zero. When used for keywords search, each sparse vector represents a document; the dimensions represent words from a dictionary, and the values represent the frequency of these words in the document. Keyword search algorithms like the BM25 algorithm compute the relevance of text documents based on the number of keyword matches, their frequency, and other factors.

Creating sparse vectors for use in hybrid search

Keyword-aware semantic search requires vector representations of documents. Because Pinecone hybrid indexes accept sparse indexes rather than documents, you can control the generation of sparse vectors to represent documents. You can choose a tokenizer or analyzer, such as Hugging Face, spaCy, Lucene to convert documents into sparse vectors, such as term-frequency vectors. The result is a dictionary that maps token IDs to term frequencies.


sparse_vector = dict(Counter(tokenizer.encode(doc)))  # {5:1, 10500:1, 7:1, ... }

Pinecone creates hybrid vectors from your sparse and dense vectors

Hybrid vectors combine dense and sparse vectors. In Pinecone, each hybrid vector consists of a sparse vector and a dense vector. Your hybrid index accepts vector upserts and queries containing both dense and sparse vector parameters and combines these into hybrid vectors.

When you upsert a hybrid vector using the hybrid upsert endpoint, your index normalizes the sparse vector for BM25 ranking and stores the normalized version. If you upsert hybrid vectors using the standard upsert operation, your index stores them without normalization.

Hybrid indexes store hybrid vectors and keyword search parameters

Pinecone stores hybrid vectors in hybrid indexes. A hybrid index has all of the features of a default dense vector index as well as a set of parameters for BM25 ranking. Your hybrid index uses these parameters to perform BM25 ranking of keyword search results and combines these keyword results with semantic search results to produce hybrid results.

Hybrid indexes use the s1h pod type.

Hybrid queries include sparse and dense vectors with weighting parameter

To query your hybrid index, you provide a hybrid query vector and a weight parameter alpha that determines the relative weight of similarity and keyword relevance in hybrid query results. Your index performs both a semantic or similarity search and a keyword search; then, your index ranks the vectors in your index based on a combination of similarity and keyword matching and returns the most relevant results. Hybrid query results contain both dense and sparse vector values.

If you query your hybrid index using the hybrid query endpoint, your hybrid index denormalizes the sparse component of the hybrid result vectors before returning them in query results, so that they match the upserted sparse vectors.

Sparse vector search returns BM25 ranked results

If you query the hybrid index through the hybrid query endpoint, the sparse rankings are similar to those produced by the BM25 algorithm.

If you query the hybrid index directly, then the hybrid index returns the sparse vectors with the highest dot product across the sparse component of the hybrid query vector.

Hybrid queries specify weight of dense and sparse rankings

When you query a hybrid index, you provide both dense and sparse vectors, which the hybrid query endpoint combines to create a hybrid query vector. Your index performs similarity search using the dense component of the hybrid vector and keyword search using the sparse component. The hybrid query endpoint normalizes the sparse vector component before searching. Your hybrid query also contains a parameter called alpha that determines the relative weight of the relevance rankings from the dense vector searches. You can adjust alpha to adjust the relative weight of semantic and keyword search rankings.

The equation in Figure 1 below expresses how alpha affects the relative weighting of lexical or keyword ranking and semantic ranking in hybrid query results.

Equation 1: Linear combination with weighting parameter alpha

hybrid(q,d)=(1-α)flexical(q,d) + αfsemantic(q,d)

Values for alpha between .7 and .9 result in the best performance for in-domain models. When using a model that is not trained for the corpus, or is out-of-domain, downweight the semantic score with lower values of alpha in the range 0.3-0.6. When the model is fine-tuned or in-domain, use values closer to 1.

Figure 2 below shows the relationship between the value of alpha and the NDCG relevance metric.

Relevance by alpha value for in- and out-of-domain modelsRelevance by alpha value for in- and out-of-domain models