Get started with sparse indexes
Sparse indexes are in early access. See Limitations for details.
Pinecone has long supported sparse retrieval in serverless indexes, initially combining dense and sparse methods through sparse boosting. Although effective in many scenarios, this hybrid strategy occasionally missed relevant results in keyword-heavy or entity-specific queries.
To address these gaps, Pinecone is introducing sparse-only indexes. These indexes enable direct indexing and retrieval of sparse vectors, supporting traditional methods like BM25 and learned sparse models such as pinecone-sparse-english-v0
. By separating dense and sparse workflows, you gain greater control over retrieval strategies, optimizing results for specific use cases and significantly improving performance over sparse boosting. Detailed benchmarks on database performance and retrieval quality will be available soon.
This page shows you how to get started with sparse-only indexes.
Sparse vector embedding
You can create a sparse index designed for external or integrated vector embedding:
-
External embedding: In this case, you use a external sparse embedding model to convert your text to sparse vectors, and then you upsert and search with those vectors directly.
-
Integrated embedding: In this case, you upsert and search with your source text, and Pinecone uses a hosted sparse embedding model to convert the text to sparse vectors automatically.
SDK support
Sparse indexes are supported by all Pinecone SDKs.
To work with sparse indexes, install the latest SDK version:
Create a sparse index
To create an index for sparse vectors created with an external embedding model, use the create_index
operation as follows:
-
Provide a
name
for the index. -
Set the
vector_type
tosparse
. -
Set the
metric
todotproduct
. Sparse indexes do not support other distance metrics. -
Set
spec.cloud
andspec.region
to the cloud and region where the index should be deployed. Onlyaws
ineu-west-1
,us-east-1
, andus-west-2
are supported at this time.
Other parameters are optional. See the API reference for details.
Upsert sparse vectors
It can take up to 1 minute before upserted records are available to query.
Use an external embedding model to convert your source text to sparse vectors, and then use the upsert
operation to upsert the vectors into a namespace in a sparse index.
For example, the following code upserts sparse vector representations of sentences related to the term “apple”, with the source text and additional fields stored as metadata:
Search a sparse index
When you search a sparse index, Pinecone retrieves records matching exact terms in the query. This is known as keyword search, or lexical search. Query terms are scored independently and then summed. The most similar records are those with the highest score.
Basic search
Use an external embedding model to convert your query text to a query vector, and then use the query
operation to search a namespace in the index, using the query vector.
For example, the following code uses a sparse vector representation of the query “What is AAPL’s outlook, considering both product launches and market conditions?” to search for the 3 most similar vectors in the example-namespaces
namespace:
The results will look like this:
Rerank results
Use the standalone rerank
operation to rerank query results based on their relevance to the query. Specify the hosted reranking model you want to use, the query results and the query, the number of ranked results to return, the field to use for reranking, and any other model-specific parameters.
For example, the repeat the previous search, but this time use the hosted bge-reranker-v2-m3
model to rerank the values of the documents.chunk_text
fields based on their relevance to the query and return only the 2 most relevant documents:
The response will look like this:
Filter by metadata
Add a metadata filter to limit the search to sparse vectors matching the filter expression.
For example, the records you upserted earlier include a quarter
metadata field. Repeat the previous search, but this time limit the search to records with the quarter Q4
:
Notice that the search results now include only two records with the quarter “Q4”:
Limitations
These limitations are subject to change during the early access period.
Sparse indexes have the following limitations:
-
Max sparse records per namespace: 10,000,000
-
Max non-zero values per sparse vector: 1000
-
Max upserts per second per sparse index: 10
-
Max queries per second per sparse index: 10
-
Max
top_k
value per query: 1000You may get fewer than
top_k
results iftop_k
is larger than the number of sparse vectors in your index that match your query. That is, any vectors where the dotproduct score is0
will be discarded. -
Max query results size: 4MB
-
Supported cloud and region:
aws
,eu-west-1
,us-east-1
,us-west-2
. -
Limited performance with high cardinality metadata. Better metadata indexing is coming soon.
Sparse indexes do not yet support the following features:
Billing
During early access, sparse indexes are offered without charge.
Pricing for Pinecone’s pinecone-sparse-english-v0
embedding model applies as listed.
Was this page helpful?