Reranking is used as part of a two-stage vector retrieval process to improve the quality of results. You first query an index for a given number of relevant results, and then you send the query and results to a reranking model. The reranking model scores the results based on their semantic relevance to the query and returns a new, more accurate ranking. This approach is one of the simplest methods for improving quality in retrieval augmented generation (RAG) pipelines.

Pinecone provides hosted reranking models so it’s easy to manage two-stage vector retrieval on a single platform. You can use a hosted model to rerank results as an integrated part of a query, or you can use a hosted model or external model to rerank results as a standalone operation.

Integrated reranking

To rerank initial results as an integrated part of a query, without any extra steps, use the search operation with the rerank parameter, including the hosted reranking model you want to use, the number of reranked results to return, and the fields to use for reranking, if different than the main query.

For example, the following code searches for the 3 records most semantically related to a query text and uses the hosted bge-reranker-v2-m3 model to rerank the results and return only the 2 most relevant documents:

from pinecone import Pinecone

pc = Pinecone(api_key="YOUR_API_KEY")

# To get the unique host for an index, 
# see https://docs.pinecone.io/guides/manage-data/target-an-index
index = pc.Index(host="INDEX_HOST")

ranked_results = index.search(
    namespace="example-namespace", 
    query={
        "inputs": {"text": "Disease prevention"}, 
        "top_k": 4
    },
    rerank={
        "model": "bge-reranker-v2-m3",
        "top_n": 2,
        "rank_fields": ["chunk_text"]
    },
    fields=["category", "chunk_text"]
)

print(ranked_results)

The response looks as follows. For each hit, the _score represents the relevance of a document to the query, normalized between 0 and 1, with scores closer to 1 indicating higher relevance.

{'result': {'hits': [{'_id': 'rec3',
                      '_score': 0.004399413242936134,
                      'fields': {'category': 'immune system',
                                 'chunk_text': 'Rich in vitamin C and other '
                                                'antioxidants, apples '
                                                'contribute to immune health '
                                                'and may reduce the risk of '
                                                'chronic diseases.'}},
                     {'_id': 'rec4',
                      '_score': 0.0029235430993139744,
                      'fields': {'category': 'endocrine system',
                                 'chunk_text': 'The high fiber content in '
                                                'apples can also help regulate '
                                                'blood sugar levels, making '
                                                'them a favorable snack for '
                                                'people with diabetes.'}}]},
 'usage': {'embed_total_tokens': 8, 'read_units': 6, 'rerank_units': 1}}

Standalone reranking

To rerank initial results as a standalone operation, use the rerank operation with the hosted reranking model you want to use, the query results and the query, the number of ranked results to return, the field to use for reranking, and any other model-specific parameters.

For example, the following code uses the hosted bge-reranker-v2-m3 model to rerank the values of the documents.chunk_text fields based on their relevance to the query and return only the 2 most relevant documents, along with their score:

from pinecone import Pinecone

pc = Pinecone(api_key="YOUR_API_KEY")

ranked_results = pc.inference.rerank(
    model="bge-reranker-v2-m3",
    query="What is AAPL's outlook, considering both product launches and market conditions?",
    documents=[
        {"id": "vec2", "chunk_text": "Analysts suggest that AAPL'\''s upcoming Q4 product launch event might solidify its position in the premium smartphone market."},
        {"id": "vec3", "chunk_text": "AAPL'\''s strategic Q3 partnerships with semiconductor suppliers could mitigate component risks and stabilize iPhone production."},
        {"id": "vec1", "chunk_text": "AAPL reported a year-over-year revenue increase, expecting stronger Q3 demand for its flagship phones."},
    ],
    top_n=2,
    rank_fields=["chunk_text"],
    return_documents=True,
    parameters={
        "truncate": "END"
    }
)

print(ranked_results)

The response looks as follows. For each hit, the _score represents the relevance of a document to the query, normalized between 0 and 1, with scores closer to 1 indicating higher relevance.

RerankResult(
  model='bge-reranker-v2-m3',
  data=[{
    index=0,
    score=0.004166256,
    document={
        id='vec2',
        chunk_text="Analysts suggest that AAPL'''s upcoming Q4 product launch event might solidify its position in the premium smartphone market."
    }
  },{
    index=2,
    score=0.0011513996,
    document={
        id='vec1',
        chunk_text='AAPL reported a year-over-year revenue increase, expecting stronger Q3 demand for its flagship phones.'
    }
  }],
  usage={'rerank_units': 1}
)

Reranking models

Pinecone hosts several reranking models so it’s easy to manage two-stage vector retrieval on a single platform. You can use a hosted model to rerank results as an integrated part of a query, or you can use a hosted model to rerank results as a standalone operation.

The following reranking models are hosted by Pinecone.

To understand how cost is calculated for reranking, see Understanding cost.