Pinecone supports vectors with sparse and dense values, which allows you to perform hybrid search, or semantic and keyword search, in one query and combine the results for more relevant results. This page explains the sparse-dense vector format and how to upsert sparse-dense vectors into Pinecone indexes.

To see sparse-dense embeddings in action, see the Ecommerce hybrid search example.

This feature is in public preview. Consider the current limitations and considerations for serverless indexes, and test thoroughly before using it in production.

Sparse-dense vector format

Pinecone represents sparse values as a dictionary of two arrays: indices and values. The elements of indices have type uint32; the elements of values have type float32.

Example

The following example defines two records with sparse and dense values.

Python
from pinecone import Pinecone

pc = Pinecone(api_key='API_KEY')
index = pc.Index('example-index') 

records=[
    {'id': 'vec1',
     # The 'values' are dense vector values.
     'values': [0.1, 0.2, 0.3],
     'metadata': {'genre': 'drama'},
     'sparse_values': {
         'indices': [10, 45, 16],
         'values': [0.5, 0.5, 0.2]
     }
    },
    {'id': 'vec2',
     'values': [0.2, 0.3, 0.4],
     'metadata': {'genre': 'action'},
     'sparse_values': {
             # Indices have type uint32 
             'indices': [15, 40, 11],
             # Values have type float32
             'values': [0.4, 0.5, 0.2]
     }
    }
]

Pinecone supports sparse vectors of up to 1000 non-zero values and 4.2 billion dimensions.

Assuming a dense vector component with 768 dimensions, Pinecone supports roughly 2.8M sparse vectors per s1 pod or 900k per p1 pod.

Upsert records with sparse-dense values

To upsert records with sparse-dense values, use the upsert operation, specifying dense values in the value parameter and sparse values in the sparse_values parameter.

Only indexes using the dotproduct distance metric support sparse-dense vectors. Upserting sparse-dense vectors into indexes with a different distance metric will succeed, but querying will return an error.

from pinecone import Pinecone

pc = Pinecone(api_key="YOUR_API_KEY")
index = pc.Index("pinecone-index")

upsert_response = index.upsert(
  vectors=[
    {'id': 'vec1',
      'values': [0.1, 0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8],
      'metadata': {'genre': 'drama'},
      'sparse_values': {
          'indices': [1, 5],
          'values': [0.5, 0.5]
      }},
    {'id': 'vec2',
      'values': [0.2, 0.3, 0.4, 0.5, 0.6, 0.7, 0.8, 0.9],
      'metadata': {'genre': 'action'},
      'sparse_values': {
          'indices': [5, 6],
          'values': [0.4, 0.5]
      }}
  ],
  namespace='example-namespace'
)

Next steps