Overview
Cohere Rerank 4.0 Fast (cohere-rerank-4-fast) is the latest iteration in Cohere’s Rerank model series, succeeding Cohere Rerank v3.5. It improves relevance quality on queries that express explicit or implicit constraints, and it retains multilingual support for 100+ languages with strong performance in domains like finance and hospitality.Cohere Rerank 4.0 is hosted on Azure AI under the Global Standard deployment type, and requests may be processed in regions outside the United States.cohere-rerank-4-fast replaces cohere-rerank-3.5, which is deprecated. Starting August 1, 2026, requests to cohere-rerank-3.5 are automatically served by cohere-rerank-4-fast. Because the two models return different relevance scores, migrate your rerank requests and re-tune any hard-coded score thresholds against cohere-rerank-4-fast before the transition.Reranking with
cohere-rerank-4-fast is billed per rerank unit. Most requests count as a single unit. One rerank unit covers a query plus up to 100 documents, and documents longer than about 500 tokens are split into ~500-token chunks that each count toward that 100, so a request with more than 100 chunks costs more than one unit. cohere-rerank-3.5 billed one unit per request. To keep cost and latency predictable, use max_tokens_per_doc to truncate long documents. For details, see Understanding cost.