Pinecone Inference is an API service that gives you access to embedding models hosted on Pinecone’s infrastructure. Support for reranking models is coming soon.

This feature is in public preview and is not recommended for production usage.


To use the Inference API, you need a Pinecone account and a Pinecone API key.

The Inference API is a stand-alone service. You can store your generated embeddings in a Pinecone vector database, but you are not required to do so.

Embedding models

The Inference API provides access to the following embedding models:

ModelDimensionMax input tokensMax batch sizeParametersPriceStarter tier usage
multilingual-e5-large102450796input_type: ‘query’ or ‘passage’
truncation: ‘END’ or ‘NONE’
$0.08 / 1M tokens5M tokens

Client support

You can use the Inference APi with the Pinecone Python client or Node.js client.

To use the Inference API with the Python client, upgrade to the latest client version, which includes the required pinecone-plugin-inference package:

pip install --upgrade "pinecone-client[grpc]"

To use the Inference API with the Node.js client, update to the latest client version:

npm install @pinecone-database/pinecone@latest


During public preview, the Inference API has the following per-project rate limits:

  • Requests per minute: 50
  • Tokens per minute: 250,000

To request a rate increase, contact Support.


Inference billing is based on tokens used. To learn more, see Understanding cost.