Sparse indexes are in early access. See Limitations for details.

Pinecone has long supported sparse retrieval in serverless indexes, initially combining dense and sparse methods through sparse boosting. Although effective in many scenarios, this hybrid strategy occasionally missed relevant results in keyword-heavy or entity-specific queries.

To address these gaps, Pinecone is introducing sparse-only indexes. These indexes enable direct indexing and retrieval of sparse vectors, supporting traditional methods like BM25 and learned sparse models such as pinecone-sparse-english-v0. By separating dense and sparse workflows, you gain greater control over retrieval strategies, optimizing results for specific use cases and significantly improving performance over sparse boosting. Detailed benchmarks on database performance and retrieval quality will be available soon.

This page shows you how to get started with sparse-only indexes.

Sparse vector embedding

You can create a sparse index designed for external or integrated vector embedding:

  • External embedding: In this case, you use a external sparse embedding model to convert your text to sparse vectors, and then you upsert and search with those vectors directly.

  • Integrated embedding: In this case, you upsert and search with your source text, and Pinecone uses a hosted sparse embedding model to convert the text to sparse vectors automatically.

SDK support

Sparse indexes are supported by all Pinecone SDKs.

To work with sparse indexes, install the latest SDK version:

pip install "pinecone[grpc]" --upgrade  

Create a sparse index

To create an index for sparse vectors created with an external embedding model, use the create_index operation as follows:

  • Provide a name for the index.

  • Set the vector_type to sparse.

  • Set the metric to dotproduct. Sparse indexes do not support other distance metrics.

  • Set spec.cloud and spec.region to the cloud and region where the index should be deployed. Only aws in eu-west-1, us-east-1, and us-west-2 are supported at this time.

Other parameters are optional. See the API reference for details.

from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key="YOUR_API_KEY")

pc.create_index(
    name="example-index",
    metric="dotproduct",
    spec=ServerlessSpec(cloud="aws", region="eu-west-1"),
    vector_type="sparse"
)

Upsert sparse vectors

It can take up to 1 minute before upserted records are available to query.

Use an external embedding model to convert your source text to sparse vectors, and then use the upsert operation to upsert the vectors into a namespace in a sparse index.

For example, the following code upserts sparse vector representations of sentences related to the term “apple”, with the source text and additional fields stored as metadata:

from pinecone import Pinecone, SparseValues, Vector

pc = Pinecone(api_key="YOUR_API_KEY")

index = pc.Index("example-index")

index.upsert(
    namespace="example-namespace",
    vectors=[
        Vector(
            id="vec1",
            sparse_values=SparseValues(
                values=[1.7958984, 0.41577148, 2.828125, 2.8027344, 2.8691406, 1.6533203, 5.3671875, 1.3046875, 0.49780273, 0.5722656, 2.71875, 3.0820312, 2.5019531, 4.4414062, 3.3554688],
                indices=[822745112, 1009084850, 1221765879, 1408993854, 1504846510, 1596856843, 1640781426, 1656251611, 1807131503, 2543655733, 2902766088, 2909307736, 3246437992, 3517203014, 3590924191]
            ),
            metadata={
                "source_text": "AAPL reported a year-over-year revenue increase, expecting stronger Q3 demand for its flagship phones.",
                "category": "technology",
                "quarter": "Q3"
            }
        ),
        Vector(
            id="vec2",
            sparse_values=SparseValues(
                values=[0.4362793, 3.3457031, 2.7714844, 3.0273438, 3.3164062, 5.6015625, 2.4863281, 0.38134766, 1.25, 2.9609375, 0.34179688, 1.4306641, 0.34375, 3.3613281, 1.4404297, 2.2558594, 2.2597656, 4.8710938, 0.5605469],
                indices=[131900689, 592326839, 710158994, 838729363, 1304885087, 1640781426, 1690623792, 1807131503, 2066971792, 2428553208, 2548600401, 2577534050, 3162218338, 3319279674, 3343062801, 3476647774, 3485013322, 3517203014, 4283091697]
            ),
            metadata={
                "source_text": "Analysts suggest that AAPL'\''s upcoming Q4 product launch event might solidify its position in the premium smartphone market.",
                "category": "technology",
                "quarter": "Q4"
            }
        ),
        Vector(
            id="vec3",
            sparse_values=SparseValues(
                values=[2.6875, 4.2929688, 3.609375, 3.0722656, 2.1152344, 5.78125, 3.7460938, 3.7363281, 1.2695312, 3.4824219, 0.7207031, 0.0826416, 4.671875, 3.7011719, 2.796875, 0.61621094],
                indices=[8661920, 350356213, 391213188, 554637446, 1024951234, 1640781426, 1780689102, 1799010313, 2194093370, 2632344667, 2641553256, 2779594451, 3517203014, 3543799498, 3837503950, 4283091697]
            ),
            metadata={
                "source_text": "AAPL'\''s strategic Q3 partnerships with semiconductor suppliers could mitigate component risks and stabilize iPhone production",
                "category": "technology",
                "quarter": "Q3"
            }
        ),
        Vector(
            id="vec4",
            sparse_values=SparseValues(
                values=[0.73046875, 0.46972656, 2.84375, 5.2265625, 3.3242188, 1.9863281, 0.9511719, 0.5019531, 4.4257812, 3.4277344, 0.41308594, 4.3242188, 2.4179688, 3.1757812, 1.0224609, 2.0585938, 2.5859375],
                indices=[131900689, 152217691, 441495248, 1640781426, 1851149807, 2263326288, 2502307765, 2641553256, 2684780967, 2966813704, 3162218338, 3283104238, 3488055477, 3530642888, 3888762515, 4152503047, 4177290673]
            ),
            metadata={
                "source_text": "AAPL may consider healthcare integrations in Q4 to compete with tech rivals entering the consumer wellness space.",
                "category": "technology",
                "quarter": "Q4"
            }
        )
    ]
)

Search a sparse index

When you search a sparse index, Pinecone retrieves records matching exact terms in the query. This is known as keyword search, or lexical search. Query terms are scored independently and then summed. The most similar records are those with the highest score.

Use an external embedding model to convert your query text to a query vector, and then use the query operation to search a namespace in the index, using the query vector.

For example, the following code uses a sparse vector representation of the query “What is AAPL’s outlook, considering both product launches and market conditions?” to search for the 3 most similar vectors in the example-namespaces namespace:

from pinecone import Pinecone

pc = Pinecone(api_key="YOUR_API_KEY")

index = pc.Index("example-index")

results = index.query(
    namespace="example-namespace",
    sparse_vector={
      "values": [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
      "indices": [767227209, 1640781426, 1690623792, 2021799277, 2152645940, 2295025838, 2443437770, 2779594451, 2956155693, 3476647774, 3818127854, 4283091697]
    }, 
    top_k=3,
    include_metadata=True,
    include_values=False
)

print(results)

The results will look like this:

{'matches': [{'id': 'vec2',
              'metadata': {'category': 'technology',
                           'quarter': 'Q4',
                           'source_text': "Analysts suggest that AAPL'''s "
                                          'upcoming Q4 product launch event '
                                          'might solidify its position in the '
                                          'premium smartphone market.'},
              'score': 10.9042969,
              'values': []},
             {'id': 'vec3',
              'metadata': {'category': 'technology',
                           'quarter': 'Q3',
                           'source_text': "AAPL'''s strategic Q3 partnerships "
                                          'with semiconductor suppliers could '
                                          'mitigate component risks and '
                                          'stabilize iPhone production'},
              'score': 6.48010254,
              'values': []},
             {'id': 'vec1',
              'metadata': {'category': 'technology',
                           'quarter': 'Q3',
                           'source_text': 'AAPL reported a year-over-year '
                                          'revenue increase, expecting '
                                          'stronger Q3 demand for its flagship '
                                          'phones.'},
              'score': 5.3671875,
              'values': []}],
 'namespace': 'example-namespace',
 'usage': {'read_units': 1}}

Rerank results

Use the standalone rerank operation to rerank query results based on their relevance to the query. Specify the hosted reranking model you want to use, the query results and the query, the number of ranked results to return, the field to use for reranking, and any other model-specific parameters.

For example, the repeat the previous search, but this time use the hosted bge-reranker-v2-m3 model to rerank the values of the documents.chunk_text fields based on their relevance to the query and return only the 2 most relevant documents:

from pinecone import Pinecone

pc = Pinecone(api_key="YOUR_API_KEY")

ranked_results = pc.inference.rerank(
    model="bge-reranker-v2-m3",
    query="What is AAPL's outlook, considering both product launches and market conditions?",
    documents=[
        {"id": "vec2", "source_text": "Analysts suggest that AAPL'\''s upcoming Q4 product launch event might solidify its position in the premium smartphone market."},
        {"id": "vec3", "source_text": "AAPL'\''s strategic Q3 partnerships with semiconductor suppliers could mitigate component risks and stabilize iPhone production."},
        {"id": "vec1", "source_text": "AAPL reported a year-over-year revenue increase, expecting stronger Q3 demand for its flagship phones."},
    ],
    top_n=2,
    rank_fields=["source_text"],
    return_documents=True,
    parameters={
        "truncate": "END"
    }
)

print(ranked_results)

The response will look like this:

RerankResult(
  model='bge-reranker-v2-m3',
  data=[{
    index=0,
    score=0.004166256,
    document={
        id='vec2',
        source_text="Analysts suggest that AAPL'''s upcoming Q4 product launch event might solidify its position in the premium smartphone market."
    }
  },{
    index=2,
    score=0.0011513996,
    document={
        id='vec1',
        source_text='AAPL reported a year-over-year revenue increase, expecting stronger Q3 demand for its flagship phones.'
    }
  }],
  usage={'rerank_units': 1}
)

Filter by metadata

Add a metadata filter to limit the search to sparse vectors matching the filter expression.

For example, the records you upserted earlier include a quarter metadata field. Repeat the previous search, but this time limit the search to records with the quarter Q4:

from pinecone import Pinecone

pc = Pinecone(api_key="YOUR_API_KEY")

index = pc.Index("example-index")

filtered_results = index.query(
    namespace="example-namespace",
    sparse_vector={
      "values": [1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0],
      "indices": [767227209, 1640781426, 1690623792, 2021799277, 2152645940, 2295025838, 2443437770, 2779594451, 2956155693, 3476647774, 3818127854, 4283091697]
    }, 
    filter={"quarter": {"$eq": "Q4"}},
    top_k=3,
    include_metadata=True,
    include_values=False
)

print(filtered_results)

Notice that the search results now include only two records with the quarter “Q4”:

{'matches': [{'id': 'vec2',
              'metadata': {'category': 'technology',
                           'quarter': 'Q4',
                           'source_text': "Analysts suggest that AAPL'''s "
                                          'upcoming Q4 product launch event '
                                          'might solidify its position in the '
                                          'premium smartphone market.'},
              'score': 10.9042969,
              'values': []},
             {'id': 'vec4',
              'metadata': {'category': 'technology',
                           'quarter': 'Q4',
                           'source_text': 'AAPL may consider healthcare '
                                          'integrations in Q4 to compete with '
                                          'tech rivals entering the consumer '
                                          'wellness space.'},
              'score': 5.2265625,
              'values': []}],
 'namespace': 'example-namespace',
 'usage': {'read_units': 1}}

Limitations

These limitations are subject to change during the early access period.

Sparse indexes have the following limitations:

  • Max sparse records per namespace: 10,000,000

  • Max non-zero values per sparse vector: 1000

  • Max upserts per second per sparse index: 10

  • Max queries per second per sparse index: 10

  • Max top_k value per query: 1000

    You may get fewer than top_k results if top_k is larger than the number of sparse vectors in your index that match your query. That is, any vectors where the dotproduct score is 0 will be discarded.

  • Max query results size: 4MB

  • Supported cloud and region: aws, eu-west-1, us-east-1, us-west-2.

  • Limited performance with high cardinality metadata. Better metadata indexing is coming soon.

Sparse indexes do not yet support the following features:

Billing

During early access, sparse indexes are offered without charge.

Pricing for Pinecone’s pinecone-sparse-english-v0 embedding model applies as listed.