This page shows you how to use the Inference API embed endpoint to generate vector embeddings for text data, such as passages and queries.

Before you begin

Ensure you have the following:

The Inference API is a stand-alone service. You can store generated vector embeddings in a Pinecone vector database, but you are not required to do so.

1. Install an SDK

You can access the embed endpoint directly or use the latest Python, Node.js, or Go SDK.

To install the latest SDK version, run the following command:

If you already have an SDK, upgrade to the latest version as follows:

2. Choose a model

Pinecone hosts the following embedding models:

ModelDimensionMax input tokensMax batch sizeParameters
multilingual-e5-large102450796input_type: "query" or "passage"
truncate: "END" or "NONE"

Pinecone currently hosts models in the US only.

3. Generate embeddings

To generate vector embeddings for upsert into a Pinecone index, use the inference.embed operation. Specify a supported embedding model and provide input data and any model-specific parameters.

For example, the following code uses the the multilingual-e5-large model to generate embeddings for sentences related to the word “apple”:

The returned object looks like this:

4. Upsert embeddings

Once you’ve generated vector embeddings, use the upsert operation to store them in an index. Make sure to use an index with the same dimensionality as the embeddings.

You can also use the inference.embed operation to generate vector embeddings for queries to a Pinecone index.

For example, the following code uses the the multilingual-e5-large model to convert a question about the tech company “Apple” into a query vector and then uses that query vector to search for the three most similar vectors in the index, i.e., the vectors that represent the most relevant answers to the question: