Reranking is used as part of a two-stage vector retrieval process to improve the quality of results. You first query an index for a given number of relevant results, and then you send the query and results to a reranking model. The reranking model scores the results based on their semantic relevance to the query and returns a new, more accurate ranking. This approach is one of the simplest methods for improving quality in retrieval augmented generation (RAG) pipelines. Pinecone provides hosted reranking models so it’s easy to manage two-stage vector retrieval on a single platform. You can use a hosted model to rerank results as an integrated part of a query, or you can use a hosted model or external model to rerank results as a standalone operation.Documentation Index
Fetch the complete documentation index at: https://docs.pinecone.io/llms.txt
Use this file to discover all available pages before exploring further.
To run through this guide in your browser, see the Rerank example notebook.
Integrated reranking
To rerank initial results as an integrated part of a query, without any extra steps, use thesearch operation with the rerank parameter, including the hosted reranking model you want to use, the number of reranked results to return, and the fields to use for reranking, if different than the main query.
For example, the following code searches for the 3 records most semantically related to a query text and uses the hosted bge-reranker-v2-m3 model to rerank the results and return only the 2 most relevant documents:
_score represents the relevance of a document to the query, normalized between 0 and 1, with scores closer to 1 indicating higher relevance.
Standalone reranking
To rerank initial results as a standalone operation, use thererank operation with the hosted reranking model you want to use, the query results and the query, the number of ranked results to return, the field to use for reranking, and any other model-specific parameters.
For example, the following code uses the hosted bge-reranker-v2-m3 model to rerank the values of the documents.chunk_text fields based on their relevance to the query and return only the 2 most relevant documents, along with their score:
Rerank results on the default field
To rerank search results, specify a supported reranking model, and provide documents and a query as well as other model-specific parameters. By default, Pinecone expects the documents to be in thedocuments.text field.
For example, the following request uses the bge-reranker-v2-m3 reranking model to rerank the values of the documents.text field based on their relevance to the query, "The tech company Apple is known for its innovative products like the iPhone.".
With
truncate set to "END", the input sequence (query + document) is truncated at the token limit (1024); to return an error instead, you’d set truncate to "NONE" or leave the parameter out.Normalized between 0 and 1, the
score represents the relevance of a passage to the query, with scores closer to 1 indicating higher relevance.Rerank results on a custom field
To rerank results on a field other thandocuments.text, provide the rank_fields parameter to specify the fields on which to rerank.
The
bge-reranker-v2-m3 and pinecone-rerank-v0 models support only a single rerank field. cohere-rerank-3.5 supports multiple rerank fields, ranked based on the order of the fields specified.documents.my_field field:
Reranking models
Pinecone hosts several reranking models so it’s easy to manage two-stage vector retrieval on a single platform. You can use a hosted model to rerank results as an integrated part of a query, or you can use a hosted model to rerank results as a standalone operation. The following reranking models are hosted by Pinecone.To understand how cost is calculated for reranking, see Reranking cost. To get model details via the API, see List models and Describe a model.
cohere-rerank-3.5
cohere-rerank-3.5
cohere-rerank-3.5 is Cohere’s leading reranking model, balancing performance and latency for a wide range of enterprise search applications.Details- Modality: Text
- Max tokens per query and document pair: 40,000
- Max documents: 200
cohere-rerank-3.5 model supports the following parameters:| Parameter | Type | Required/Optional | Description | |
|---|---|---|---|---|
max_chunks_per_doc | integer | Optional | Long documents will be automatically truncated to the specified number of chunks. Accepted range: 1 - 3072. | |
rank_fields | array of strings | Optional | The fields to use for reranking. The model reranks based on the order of the fields specified (e.g., ["field1", "field2", "field3"]). | ["text"] |
bge-reranker-v2-m3
bge-reranker-v2-m3
bge-reranker-v2-m3 is a high-performance, multilingual reranking model that works well on messy data and short queries expected to return medium-length passages of text (1-2 paragraphs).Details- Modality: Text
- Max tokens per query and document pair: 1024
- Max documents: 100
bge-reranker-v2-m3 model supports the following parameters:| Parameter | Type | Required/Optional | Description | Default |
|---|---|---|---|---|
truncate | string | Optional | How to handle inputs longer than those supported by the model. Accepted values: END or NONE.END truncates the input sequence at the input token limit. NONE returns an error when the input exceeds the input token limit. | NONE |
rank_fields | array of strings | Optional | The field to use for reranking. The model supports only a single rerank field. | ["text"] |
pinecone-rerank-v0
pinecone-rerank-v0
pinecone-rerank-v0 is a state of the art reranking model that out-performs competitors on widely accepted benchmarks. It can handle chunks up to 512 tokens (1-2 paragraphs).Details- Modality: Text
- Max tokens per query and document pair: 512
- Max documents: 100
pinecone-rerank-v0 model supports the following parameters:| Parameter | Type | Required/Optional | Description | Default |
|---|---|---|---|---|
truncate | string | Optional | How to handle inputs longer than those supported by the model. Accepted values: END or NONE.END truncates the input sequence at the input token limit. NONE returns an error when the input exceeds the input token limit. | END |
rank_fields | array of strings | Optional | The field to use for reranking. The model supports only a single rerank field. | ["text"] |