Generate embeddings
This page shows you how to use the Inference API embed
endpoint to generate vector embeddings for text data, such as passages and queries.
The Inference API is a stand-alone service. You can store generated vector embeddings in a Pinecone vector database, but you are not required to do so.
1. Install an SDK
You can access the embed
endpoint directly or use the latest Python, Node.js, Go, or Java SDK.
To install the latest SDK version, run the following command:
If you already have an SDK, upgrade to the latest version as follows:
2. Choose a model
Choose an embedding model hosted by Pinecone or an externally hosted embedding model.
3. Generate embeddings
To generate vector embeddings for upsert into a Pinecone index, use the inference.embed
endpoint. Specify a supported embedding model and provide input data and any model-specific parameters.
For example, the following code uses the the multilingual-e5-large
model to generate embeddings for sentences related to the word “apple”:
The returned object looks like this:
4. Upsert embeddings
Once you’ve generated vector embeddings, use the upsert
operation to store them in an index. Make sure to use an index with the same dimensionality as the embeddings.
5. Embed a query and search
You can also use the inference.embed
endpoint to generate vector embeddings for queries to a Pinecone index.
For example, the following code uses the the multilingual-e5-large
model to convert a question about the tech company “Apple” into a query vector and then uses that query vector to search for the three most similar vectors in the index, i.e., the vectors that represent the most relevant answers to the question:
The response includes only sentences about the tech company, not the fruit:
Was this page helpful?