This page shows you how to use the Inference API embed endpoint to generate vector embeddings for text data, such as passages and queries.

The Inference API is a stand-alone service. You can store generated vector embeddings in a Pinecone vector database, but you are not required to do so.

1. Install an SDK

You can access the embed endpoint directly or use the latest Python, Node.js, Go, or Java SDK.

To install the latest SDK version, run the following command:

If you already have an SDK, upgrade to the latest version as follows:

2. Choose a model

Choose an embedding model hosted by Pinecone or an externally hosted embedding model.

3. Generate embeddings

To generate vector embeddings for upsert into a Pinecone index, use the inference.embed endpoint. Specify a supported embedding model and provide input data and any model-specific parameters.

For example, the following code uses the the multilingual-e5-large model to generate embeddings for sentences related to the word “apple”:

The returned object looks like this:

4. Upsert embeddings

Once you’ve generated vector embeddings, use the upsert operation to store them in an index. Make sure to use an index with the same dimensionality as the embeddings.

You can also use the inference.embed endpoint to generate vector embeddings for queries to a Pinecone index.

For example, the following code uses the the multilingual-e5-large model to convert a question about the tech company “Apple” into a query vector and then uses that query vector to search for the three most similar vectors in the index, i.e., the vectors that represent the most relevant answers to the question:

The response includes only sentences about the tech company, not the fruit: