Understanding Pinecone Inference
Pinecone Inference is an API that gives you access to embedding and reranking models hosted on Pinecone’s infrastructure.
Pinecone currently hosts models in the US only.
Prerequisites
To use the Inference API, you need a Pinecone account and a Pinecone API key.
Embedding models
The embed
endpoint generates embeddings for text data, such as queries or passages, using a specified embedding model.
The following embedding models are available:
multilingual-e5-large
multilingual-e5-large
is a high-performance text embedding model trained on a mixture of multilingual datasets. It works well on messy data and short queries expected to return medium-length passages of text (1-2 paragraphs).
Details
- Dimension: 1024
- Vector type: Dense
- Recommended similarity metric: Cosine
- Max input tokens per sequence: 507
- Max sequences per batch: 96
Parameters
The multilingual-e5-large
model supports the following parameters:
Parameter | Type | Required/Optional | Description | Default |
---|---|---|---|---|
input_type | string | Required | The type of input data. Accepted values: query or passage . | |
truncate | string | Optional | How to handle inputs longer than those supported by the model. Accepted values: END or NONE .END truncates the input sequence at the input token limit. NONE returns an error when the input exceeds the input token limit. | END |
Rate limits
Rate limits are defined at the project level and vary based on pricing plan and input type.
Input type | Starter plan | Paid plans |
---|---|---|
passage | 250k tokens per minute | 1M tokens per minute |
query | 50k tokens per minute | 250k tokens per minute |
Combined | 5M tokens per month | Unlimited tokens per month |
Reranking models
The rerank
endpoint takes documents and scores them by their relevance to a query.
Rerankers are used to increase retrieval quality as part of two-stage retrieval systems
The following reranking models are available:
bge-reranker-v2-m3
bge-reranker-v2-m3
is a high-performance, multilingual reranking model that works well on messy data and short queries expected to return medium-length passages of text (1-2 paragraphs).
Details
- Max tokens per query and document pair: 1024
- Max documents: 100
Parameters
The bge-reranker-v2-m3
model supports the following parameters:
Parameter | Type | Required/Optional | Description | Default |
---|---|---|---|---|
truncate | string | Optional | How to handle inputs longer than those supported by the model. Accepted values: END or NONE .END truncates the input sequence at the input token limit. NONE returns an error when the input exceeds the input token limit. | NONE |
Rate limits
Rate limits are defined at the project level and vary based on pricing plan.
Limit type | Starter plan | Paid plans |
---|---|---|
Requests per minute | 60 | 60 |
Requests per month | 500 | Unlimited |
To request a rate increase, contact Support.
pinecone-rerank-v0
This feature is in early access and is not intended for production usage.
pinecone-rerank-v0
is a state of the art reranking model that out-performs competitors on widely accepted benchmarks. It can handle chunks up to 512 tokens (1-2 paragraphs).
Details
- Max tokens per query and document pair: 512
- Max documents: 100
Parameters
The pinecone-rerank-v0
model supports the following parameters:
Parameter | Type | Required/Optional | Description | Default |
---|---|---|---|---|
truncate | string | Optional | How to handle inputs longer than those supported by the model. Accepted values: END or NONE .END truncates the input sequence at the input token limit. NONE returns an error when the input exceeds the input token limit. | END |
Rate limits
Rate limits are defined at the project level and vary based on pricing plan.
Limit type | Starter plan | Paid plans |
---|---|---|
Requests per minute | 60 | 60 |
Requests per month | 500 | Unlimited |
SDK support
You can access the embed
and rerank
endpoints directly or using a supported Pinecone SDK:
SDK | embed support | rerank support |
---|---|---|
Python | Yes | Yes |
Node.js | Yes | Yes |
Java | Yes | Yes |
Go | Yes | Yes |
.NET | Yes | Yes |
To install the latest SDK version, run the following command:
If you already have an SDK, upgrade to the latest version as follows:
Cost
Inference billing is based on tokens used. To learn more, see Understanding cost.
Was this page helpful?