This page shows you two ways to encode sparse vectors for use in hybrid search: using Pinecone Inference with the pinecone-sparse-english-v0 embedding model, or using the Pinecone Text Client with the BM25 or SPLADE algorithm.

In most cases, Pinecone Inference with the pinecone-sparse-english-v0 model will produce better results. However, if you cannot send text to Pinecone’s endpoints due to privacy considerations, you can run the Pinecone Text Client locally and send Pinecone just the vectors.

Use Pinecone Inference

Pinecone Inference is a service that gives you access to embedding and reranking models hosted on Pinecone’s infrastructure, including the pinecone-sparse-english-v0 model for sparse embeddings. Built on the innovations of the DeepImpact architecture, pinecone-sparse-english-v0 estimates the lexical importance of tokens by leveraging their context, unlike traditional retrieval models like BM25, which rely solely on term frequency.

To encode sparse vectors with Pinecone Inference, do the following:

  1. Install the latest Pinecone Python SDK and integrated inference plugin as follows:

    pip install --upgrade pinecone pinecone-plugin-records
    

    The pinecone-plugin-records plugin is not currently compatible with the pinecone[grpc] version of the Python SDK.

  2. Use the embed operation, setting the model parameter to pinecone-sparse-english-v0 and the input_type parameter to passage or query. If you want to include string tokens in the response, also set return_tokens to true.

    The returned object looks like this:

Use the Pinecone Text Client

The Pinecone Text Client is a public Python package that provides text utilities designed for seamless integration with Pinecone’s sparse-dense (hybrid) search.

To convert your text corpus to sparse vectors, you can either use BM25 or SPLADE. This guide uses BM25, which is more common.

  1. Install the Pinecone Text Client:

    pip install pinecone-text
    
  2. Initialize the BM25 encoder and fit it to your corpus of documents.

    The following example initializes a BM25Encoder object and calls the fit() function on the corpus, formatted as an array of strings:

    Python
    from pinecone_text.sparse import BM25Encoder
    
    corpus = ["The quick brown fox jumps over the lazy dog",
            "The lazy dog is brown",
            "The fox is brown"]
    
    # Initialize BM25 and fit the corpus.
    bm25 = BM25Encoder()
    bm25.fit(corpus)
    

    If you want to use the default parameters for BM25Encoder, you can call the default method. The default parameters were fitted on the MS MARCO passage ranking dataset.

    Python
    bm25 = BM25Encoder.default()
    
  3. After the encoder is initialized and fit, you can encode documents and queries as sparse vectors.

    The following example encodes a new document as a sparse vector for upsert into a Pinecone index:

    Python
    doc_sparse_vector = bm25.encode_documents("The brown fox is quick")
    

    The contents of doc_sparse_vector look like this:

    JSON
    {"indices": [102, 18, 12, ...], "values": [0.21, 0.38, 0.15, ...]}
    

    This example encodes a string as a sparse vector for use in a hybrid search query:

    Python
    query_sparse_vector = bm25.encode_queries("Which fox is brown?")
    

    The contents of query_sparse_vector look like this:

    JSON
    {"indices": [102, 16, 18, ...], "values": [0.21, 0.11, 0.15, ...]}
    

See also