Documentation Index
Fetch the complete documentation index at: https://docs.pinecone.io/llms.txt
Use this file to discover all available pages before exploring further.
Search types
- Full-text search
- Semantic search (dense-vector)
- Sparse-vector search (also called sparse-vector lexical search)
- Hybrid search
Choosing a search approach
Pinecone supports four retrieval approaches. They differ in the signal they rank on and the index shape they require.Quick decision tree
Walk through these questions in order — pick the first match.- Do your queries share specific tokens with the data? (Product names, error messages, source code, named entities, technical jargon, identifiers.) → Full-text search. BM25 ranks results that share tokens with the query; Lucene syntax adds boolean and phrase operators.
- Are your queries natural language where meaning matters more than exact wording? (Synonyms, paraphrases, conceptual similarity.) → Semantic search with a dense vector field.
-
Do you produce a learned sparse-vector representation upstream of Pinecone? (For example, using
pinecone-sparse-english-v0or your own sparse encoder.) → Sparse-vector lexical search. -
Do you need both semantic and keyword signals on the same data?
- On a JSON-document workload → Full-text search with a multi-field schema. Declare a
dense_vectorfield alongside one or more FTS-enabledstringfields. A single search request ranks by one signal; combine them by adding a text-match filter to adense_vectorquery, or by running two searches and merging results client-side. - On a vector-only records workload → Hybrid search. Store a dense vector and a sparse vector on each record in a single index (vector API).
- On a JSON-document workload → Full-text search with a multi-field schema. Declare a
Approach details
A useful gradient: dense ranks on concept (semantic similarity), full-text search ranks on strict character-level token matching (BM25), and sparse-vector lexical search sits between them — token-aware, but with learned per-token weights and term expansion.-
Full-text search — recommended for keyword and phrase search over text content. You upsert typed JSON documents and rank with
score_by: BM25 token matching on an FTS-enabledstringfield, Lucene query syntax (query_string),dense_vectorsimilarity, orsparse_vectorsimilarity. A single index with a document schema can mix all four field types, so it’s also the recommended single-index path when a workload needs more than one signal (BM25 + dense, BM25 + sparse, etc.). - Semantic search (dense-vector) — for queries where intent and meaning matter more than exact keyword matches (synonyms, paraphrases, conceptual similarity). Uses dense embeddings.
-
Sparse-vector search (also called sparse-vector lexical search) — recommended for workflows that use a learned sparse-vector model (for example,
pinecone-sparse-english-v0) or where the application owns the sparse-vector representation directly. For general-purpose keyword and phrase retrieval over text, start with full-text search. -
Hybrid search — combines dense and sparse vectors in a single index (vector API) for vector-centric workflows that need both semantic and lexical signals. For document-centric workflows that combine keyword matching with vector ranking, the most common pattern is dense (or sparse) ranking restricted by a text-match filter on an FTS-enabled
stringfield — for example, semantic search across a corpus narrowed to documents containing an exact phrase. To weight BM25 and dense rankings against each other, run separate searches and merge the results client-side.
Optimization
Limits
| Metric | Limit |
|---|---|
Max top_k value | 10,000 |
| Max result size | 4MB |
Cost
- To understand how cost is calculated for queries, see Understanding cost.
- For up-to-date pricing information, see Pricing.