pinecone-sparse-english-v0 | Pinecone

Hosted
METRICdot product
MAX INPUT TOKENS512 or 2048
TASKembedding
PRICE$0.08 / 1M Tokens
Built on the innovations of the DeepImpact architecture, the model directly estimates the lexical importance of tokens by leveraging their context, unlike traditional retrieval models like BM25, which rely solely on term frequency. The model outperforms BM25 by up to 44% (average 23%) NDCG@10 on Text Retrieval Conference (TREC) Deep Learning Tracks and up to 24% (8% on average) on BEIR. For more information see our blog post on cascading retrievalWhen using the model to generate embeddings directly, you must specify the input_type as either query or passage. When creating an index with integrated embedding, input_type defaults to query for reads and passage for writes. Optionally, you can:
  • Return the string tokens using "return_tokens": true.
  • Raise the max input tokens limit from the default of 512 to the maximum of 2048 using "max_tokens_per_sequence": 2048.
  • Return an error when the input exceeds max_tokens_per_sequence using "truncate": "NONE".

Installation

pip install --upgrade pinecone

Create index

from pinecone import Pinecone

pc = Pinecone(api_key="YOUR_API_KEY")

# Create a sparse index with integrated embedding
index_name = "pinecone-sparse-english-v0"

pc.create_index_for_model(
    name=index_name,
    cloud="aws",
    region="us-east-1",
    embed={
        "model": "pinecone-sparse-english-v0",
        "field_map": {
            "text": "text" # Map the record field to be embedded
        },
        "read_parameters": {
            "max_tokens_per_sequence": 2048 # Max input tokens for queries
        },
        "write_parameters": {
            "max_tokens_per_sequence": 2048 # Max input tokens for upserts and updates
        }
    }
)

index = pc.Index(index_name)

Embed & upsert

data = [
    {"id": "vec1", "text": "Apple is a popular fruit known for its sweetness and crisp texture."},
    {"id": "vec2", "text": "The tech company Apple is known for its innovative products like the iPhone."},
    {"id": "vec3", "text": "Many people enjoy eating apples as a healthy snack."},
    {"id": "vec4", "text": "Apple Inc. has revolutionized the tech industry with its sleek designs and user-friendly interfaces."},
    {"id": "vec5", "text": "An apple a day keeps the doctor away, as the saying goes."},
    {"id": "vec6", "text": "Apple Computer Company was founded on April 1, 1976, by Steve Jobs, Steve Wozniak, and Ronald Wayne as a partnership."}
]

index.upsert_records(
    namespace="example-namespace",
    records=data
)

Query

query_payload = {
    "inputs": {
        "text": "Tell me about the tech company known as Apple."
    },
    "top_k": 3
}

results = index.search(
    namespace="example-namespace",
    query=query_payload
)

print(results)
Lorem Ipsum