Built on the innovations of the DeepImpact architecture, the model directly estimates the lexical importance of tokens by leveraging their context, unlike traditional retrieval models like BM25, which rely solely on term frequency. The model outperforms BM25 by up to 44% (average 23%) NDCG@10 on Text REtrieval Conference (TREC) Deep Learning Tracks and up to 24% (8% on average) on BEIR. For more information see our blog post on cascading retrieval
You must specify the input_type as either query or passage. You can optionally return the string tokens using "return_tokens": True.
Installation
pip install--upgrade pinecone
Create index
from pinecone import Pineconepc = Pinecone(api_key="YOUR_API_KEY")# Create a sparse index with integrated inferenceindex_name ="pinecone-sparse-english-v0"pc.create_index_for_ model( name=index_name, cloud="aws", region="us-east-1", embed={"model":"pinecone-sparse-english-v0","field_map":{"text":"text"# Map the record field to be embedded}})index = pc.Index(index_name)
Embed & upsert
data =[{"id":"vec1","text":"Apple is a popular fruit known for its sweetness and crisp texture."},{"id":"vec2","text":"The tech company Apple is known for its innovative products like the iPhone."},{"id":"vec3","text":"Many people enjoy eating apples as a healthy snack."},{"id":"vec4","text":"Apple Inc. has revolutionized the tech industry with its sleek designs and user-friendly interfaces."},{"id":"vec5","text":"An apple a day keeps the doctor away, as the saying goes."},{"id":"vec6","text":"Apple Computer Company was founded on April 1, 1976, by Steve Jobs, Steve Wozniak, and Ronald Wayne as a partnership."}]index.upsert_records( namespace="example-namespace", records=data)
Query
query_payload ={"inputs":{"text":"Tell me about the tech company known as Apple."},"top_k":3}results = index.search( namespace="example-namespace", query=query_payload)print(results)