This page shows you how to generate vector embeddings for text data, such as passages and queries. To integrate vector embedding with database operations, see Integrated embedding.

Choose an embedding model

Choose an embedding model hosted by Pinecone or an externally hosted embedding model.

Generate vectors for upserting

You can generate vector embeddings for upsert into a Pinecone index. Specify a supported embedding model and provide input data and any model-specific parameters.

For example, the following code uses the the multilingual-e5-large embedding model to generate dense vector embeddings for sentences related to the word “apple”:

# Import the Pinecone library
from pinecone.grpc import PineconeGRPC as Pinecone
from pinecone import ServerlessSpec
import time

# Initialize a Pinecone client with your API key
pc = Pinecone(api_key="YOUR_API_KEY")

# Define a sample dataset where each item has a unique ID and piece of text
data = [
    {"id": "vec1", "text": "Apple is a popular fruit known for its sweetness and crisp texture."},
    {"id": "vec2", "text": "The tech company Apple is known for its innovative products like the iPhone."},
    {"id": "vec3", "text": "Many people enjoy eating apples as a healthy snack."},
    {"id": "vec4", "text": "Apple Inc. has revolutionized the tech industry with its sleek designs and user-friendly interfaces."},
    {"id": "vec5", "text": "An apple a day keeps the doctor away, as the saying goes."},
    {"id": "vec6", "text": "Apple Computer Company was founded on April 1, 1976, by Steve Jobs, Steve Wozniak, and Ronald Wayne as a partnership."}
]

# Convert the text into numerical vectors that Pinecone can index
embeddings = pc.inference.embed(
    model="multilingual-e5-large",
    inputs=[d['text'] for d in data],
    parameters={"input_type": "passage", "truncate": "END"}
)

print(embeddings)

The returned object looks like this:

EmbeddingsList(
    model='multilingual-e5-large',
    data=[
        {'values': [0.04925537109375, -0.01313018798828125, -0.0112762451171875, ...]},
        ...
    ],
    usage={'total_tokens': 130}
)

Once you’ve generated vector embeddings, upsert them into an index. For dense vectors, make sure to use a dense index with the same dimensionality as the vectors. For sparse vectors, make sure to use a sparse index.

# Target the index where you'll store the vector embeddings
index = pc.Index("example-index")

# Prepare the records for upsert
# Each contains an 'id', the embedding 'values', and the original text as 'metadata'
records = []
for d, e in zip(data, embeddings):
    records.append({
        "id": d['id'],
        "values": e['values'],
        "metadata": {'text': d['text']}
    })

# Upsert the records into the index
index.upsert(
    vectors=records,
    namespace="example-namespace"
)

Generate vectors for querying

You can also generate vector embeddings for queries to a Pinecone index.

For example, the following code uses the the multilingual-e5-large embedding model to convert a question about the tech company “Apple” into a dense query vector and then uses that query vector to search for the three most similar vectors in a dense index, i.e., the vectors that represent the most relevant answers to the question:

# Define your query
query = "Tell me about the tech company known as Apple."

# Convert the query into a numerical vector that Pinecone can search with
query_embedding = pc.inference.embed(
    model="multilingual-e5-large",
    inputs=[query],
    parameters={
        "input_type": "query"
    }
)

# Search the index for the three most similar vectors
results = index.query(
    namespace="example-namespace",
    vector=query_embedding[0].values,
    top_k=3,
    include_values=False,
    include_metadata=True
)

print(results)

The response includes only sentences about the tech company, not the fruit:

{'matches': [{'id': 'vec2',
              'metadata': {'text': 'The tech company Apple is known for its '
                                   'innovative products like the iPhone.'},
              'score': 0.8727808,
              'sparse_values': {'indices': [], 'values': []},
              'values': []},
             {'id': 'vec4',
              'metadata': {'text': 'Apple Inc. has revolutionized the tech '
                                   'industry with its sleek designs and '
                                   'user-friendly interfaces.'},
              'score': 0.8526099,
              'sparse_values': {'indices': [], 'values': []},
              'values': []},
             {'id': 'vec6',
              'metadata': {'text': 'Apple Computer Company was founded on '
                                   'April 1, 1976, by Steve Jobs, Steve '
                                   'Wozniak, and Ronald Wayne as a '
                                   'partnership.'},
              'score': 0.8499719,
              'sparse_values': {'indices': [], 'values': []},
              'values': []}],
 'namespace': 'example-namespace',
 'usage': {'read_units': 6}}