Skip to main content

Overview

Cohere Rerank 4.0 Fast (cohere-rerank-4-fast) is the latest iteration in Cohere’s Rerank model series, succeeding Cohere Rerank v3.5. It improves relevance quality on queries that express explicit or implicit constraints, and it retains multilingual support for 100+ languages with strong performance in domains like finance and hospitality.Cohere Rerank 4.0 is hosted on Azure AI under the Global Standard deployment type, and requests may be processed in regions outside the United States.
cohere-rerank-4-fast replaces cohere-rerank-3.5, which is deprecated. Starting August 1, 2026, requests to cohere-rerank-3.5 are automatically served by cohere-rerank-4-fast. Because the two models return different relevance scores, migrate your rerank requests and re-tune any hard-coded score thresholds against cohere-rerank-4-fast before the transition.
Reranking with cohere-rerank-4-fast is billed per rerank unit. Most requests count as a single unit. One rerank unit covers a query plus up to 100 documents, and documents longer than about 500 tokens are split into ~500-token chunks that each count toward that 100, so a request with more than 100 chunks costs more than one unit. cohere-rerank-3.5 billed one unit per request. To keep cost and latency predictable, use max_tokens_per_doc to truncate long documents. For details, see Understanding cost.

Installation

pip install -U pinecone

Reranking

See rerank to learn more about reranking.
from pinecone import Pinecone

pc = Pinecone("API-KEY")

query = "Tell me about Apple's products"
results = pc.inference.rerank(
    model="cohere-rerank-4-fast",
    query=query,
    documents=[
"Apple is a popular fruit known for its sweetness and crisp texture.",
"Apple is known for its innovative products like the iPhone.",
"Many people enjoy eating apples as a healthy snack.",
"Apple Inc. has revolutionized the tech industry with its sleek designs and user-friendly interfaces.",
"An apple a day keeps the doctor away, as the saying goes.",
    ],
    top_n=3,
    return_documents=True
)

print(query)
for r in results.data:
  print(r.score, r.document.text)

Lorem Ipsum