For example, the following code uses the the multilingual-e5-large embedding model to generate dense vector embeddings for sentences related to the word “apple”:
# Import the Pinecone libraryfrom pinecone.grpc import PineconeGRPC as Pineconefrom pinecone import ServerlessSpecimport time# Initialize a Pinecone client with your API keypc = Pinecone(api_key="YOUR_API_KEY")# Define a sample dataset where each item has a unique ID and piece of textdata =[{"id":"vec1","text":"Apple is a popular fruit known for its sweetness and crisp texture."},{"id":"vec2","text":"The tech company Apple is known for its innovative products like the iPhone."},{"id":"vec3","text":"Many people enjoy eating apples as a healthy snack."},{"id":"vec4","text":"Apple Inc. has revolutionized the tech industry with its sleek designs and user-friendly interfaces."},{"id":"vec5","text":"An apple a day keeps the doctor away, as the saying goes."},{"id":"vec6","text":"Apple Computer Company was founded on April 1, 1976, by Steve Jobs, Steve Wozniak, and Ronald Wayne as a partnership."}]# Convert the text into numerical vectors that Pinecone can indexembeddings = pc.inference.embed( model="multilingual-e5-large", inputs=[d['text']for d in data], parameters={"input_type":"passage","truncate":"END"})print(embeddings)
Once you’ve generated vector embeddings, upsert them into an index. For dense vectors, make sure to use a dense index with the same dimensionality as the vectors. For sparse vectors, make sure to use a sparse index.
# Target the index where you'll store the vector embeddingsindex = pc.Index("example-index")# Prepare the records for upsert# Each contains an 'id', the embedding 'values', and the original text as 'metadata'records =[]for d, e inzip(data, embeddings): records.append({"id": d['id'],"values": e['values'],"metadata":{'text': d['text']}})# Upsert the records into the indexindex.upsert( vectors=records, namespace="example-namespace")
For example, the following code uses the the multilingual-e5-large embedding model to convert a question about the tech company “Apple” into a dense query vector and then uses that query vector to search for the three most similar vectors in a dense index, i.e., the vectors that represent the most relevant answers to the question:
# Define your queryquery ="Tell me about the tech company known as Apple."# Convert the query into a numerical vector that Pinecone can search withquery_embedding = pc.inference.embed( model="multilingual-e5-large", inputs=[query], parameters={"input_type":"query"})# Search the index for the three most similar vectorsresults = index.query( namespace="example-namespace", vector=query_embedding[0].values, top_k=3, include_values=False, include_metadata=True)print(results)
The response includes only sentences about the tech company, not the fruit:
{'matches':[{'id':'vec2','metadata':{'text':'The tech company Apple is known for its ''innovative products like the iPhone.'},'score':0.8727808,'sparse_values':{'indices':[],'values':[]},'values':[]},{'id':'vec4','metadata':{'text':'Apple Inc. has revolutionized the tech ''industry with its sleek designs and ''user-friendly interfaces.'},'score':0.8526099,'sparse_values':{'indices':[],'values':[]},'values':[]},{'id':'vec6','metadata':{'text':'Apple Computer Company was founded on ''April 1, 1976, by Steve Jobs, Steve ''Wozniak, and Ronald Wayne as a ''partnership.'},'score':0.8499719,'sparse_values':{'indices':[],'values':[]},'values':[]}],'namespace':'example-namespace','usage':{'read_units':6}}