Documentation Index
Fetch the complete documentation index at: https://docs.pinecone.io/llms.txt
Use this file to discover all available pages before exploring further.
- You upsert data as JSON documents.
- You declare how each field should be indexed via a schema — as a
stringfield withfull_text_searchenabled (BM25 scoring), adense_vectorfield, or asparse_vectorfield. The schema is for ranking fields only; metadata fields are not declared. - Pinecone indexes each field’s content according to the type of the field declared in the schema. Any other fields on the upserted documents are automatically stored and indexed for filtering — no schema declaration required.
- Text fields (
type: "string"with afull_text_searchconfig object —{}enables with all defaults) — indexed for BM25 ranking and Lucene queries. - Dense vector fields (
type: "dense_vector") — indexed for ANN similarity search. - Sparse vector fields (
type: "sparse_vector") — indexed for sparse vector similarity search.
include_fields, and automatically indexed for filtering — see Metadata fields.
Every search picks exactly one ranking signal. The score_by clause selects the scoring method for the request:
text— BM25 token matching on a single FTS-enabledstringfield.query_string— Lucene query syntax across one or more FTS-enabledstringfields, including cross-field boolean queries.dense_vector— vector similarity against adense_vectorfield.sparse_vector— sparse-vector similarity against asparse_vectorfield.
score_by with a metadata filter — including the text-match operators $match_phrase, $match_all, and $match_any on FTS-enabled string fields, plus the standard logical and comparison operators ($and, $or, $not, $exists, etc.). The filter narrows what’s eligible; the score_by ranks what remains. This is the most common hybrid pattern.
For example, on an index whose schema declares both a dense_vector field (review_embedding) and an FTS-enabled string field (review_text), this single request runs semantic search across the corpus but only over documents whose review_text contains the exact phrase “beautifully written”:
Filters vs. scoring
Filters are deterministic — each document either matches or it doesn’t — and they apply before scoring. Scoring methods (text/BM25, query_string/Lucene, dense_vector, sparse_vector) order whatever remains after filtering, and only the top top_k hits are returned (max 10,000).
When you’re combining text matching with vector ranking, start with the hard yes/no constraints as filters (including the text-match operators $match_phrase, $match_all, $match_any on FTS-enabled string fields), then pick a score_by method to rank whatever remains. Use BM25 (score_by text or query_string) when keyword and phrase ranking order matters, not just inclusion.
dense_vector and sparse_vector fields, plus one or more string fields with full_text_search enabled. A single search request scores results with one ranking method at a time: dense vector, sparse vector, BM25 text, or Lucene query syntax. You can still combine vector ranking with full-text keyword matching in one request by using a text-match filter, such as $match_phrase, $match_all, or $match_any. The vector search ranks the matching documents; the full-text filter narrows the set of documents to search.Schema definition
The schema is required at index creation and declares the fields that drive ranking or vector search. Filterable metadata is not declared in the schema — any field you upsert that is not declared in the schema is automatically stored and indexed for filtering. Schema field types:| Type | Purpose | Key options |
|---|---|---|
dense_vector | ANN similarity search | dimension (required), metric (cosine, dotproduct, euclidean) |
sparse_vector | Sparse-vector similarity search with values from a custom sparse encoder | — |
string (text) | Full-text search with a nested full_text_search config object ({} enables with all defaults) | language, stemming, stop_words (all optional, under full_text_search) |
string field without full_text_search, or a string_list, float, or boolean field) is rejected at index creation with a 400 error. Metadata fields are auto-indexed at upsert time — see Metadata fields._ or $. The _ prefix is reserved for system-managed fields (for example, _id, _score); $ is reserved for filter operators. Field names are also limited to 64 bytes. Every document has a required _id field, which carries its unique identifier. A user metadata field named score is allowed — match scores are returned as _score to avoid collisions.
semantic_text. To use dense or sparse vector ranking in an index with a document schema, declare a dense_vector or sparse_vector field and provide vector values at upsert time.Coming from integrated embedding? If you upsert raw text today and rely on Pinecone to vectorize it, those workflows continue to be fully supported on existing indexes with dense or sparse vectors (records API). The two index shapes are independent — you can keep an integrated-embedding records index and stand up a separate document-schema index for full-text or multi-field workloads.string field with full_text_search is not metadata and does not count toward the 40 KB metadata limit for records. Use these FTS-enabled string fields for searchable chunk text. In public preview, indexes with document schemas do not support combining integrated inference fields, such as semantic_text fields, with full-text-search fields. To combine semantic ranking with full-text search, declare a dense_vector field alongside one or more FTS-enabled string fields and provide dense vector values when you upsert documents.{} enables FTS with all defaults; sub-fields like language, stemming, and stop_words are optional overrides)
category (string), tags (array of strings), year (number), or in_stock (boolean). These fields are stored on the document, returned via include_fields, and automatically indexed for filtering. They do not need to be declared in the schema.
Metadata fields
Metadata fields are not declared in the schema. Any field you include on an upserted document that is not declared in the schema is treated as metadata: it is stored on the document, returned viainclude_fields, and automatically indexed for filtering with the standard operators ($eq, $ne, $gt, $gte, $lt, $lte, $in, $nin, $exists, $and, $or, $not).
Metadata field types are inferred from the values you upsert: strings, numbers (stored as floating point), booleans, and arrays of strings are all supported. You can mix metadata field types across documents in the same index.
API and SDK reference
Full-text search uses API version2026-01.alpha. All requests require the header X-Pinecone-Api-Version: 2026-01.alpha.
The endpoints below are split into control-plane operations (project-scoped, authenticated against api.pinecone.io) and data-plane operations (index-scoped, authenticated against the per-index INDEX_HOST.svc.<region>.pinecone.io host returned by DescribeIndex). The preview SDK reflects the same split: pc.preview.* for control-plane FTS operations and pc.preview.index(...).documents.* for data-plane document operations.
Control plane operations
Control plane operations manage indexes and their configuration.Create index (POST /indexes)
Create index (POST /indexes)
-
name(string, optional) - Unique index name (lowercase alphanumeric and hyphens, 1-45 characters). Auto-generated if omitted. -
deployment(object, optional) - Deployment configuration. Defaults tomanagedon AWSus-east-1if omitted.deployment_type(string) -"managed"for serverless,"pod"for pod-based,"byoc"for bring-your-own-cloud.- For
managed:cloud("aws"|"gcp"|"azure"),region(e.g.,"us-east-1").
-
schema(object, required) - Schema definition. See Schema definition for all supported field types. Each field inschema.fieldsuses thetypediscriminator to select its configuration:dense_vector:dimension(required),metric(required, one ofcosine,dotproduct,euclidean).sparse_vector: no additional options.string(text):full_text_search: { ... }(object); optional sub-fieldslanguage,stemming,stop_words.- Any field may also include an optional
description(string) — free-text documentation of what the field contains. It’s stored on the schema and returned by describe-index, and is especially useful for agentic workflows where an LLM inspects the schema to decide how to query the index.
stringwithoutfull_text_search,string_list,float,boolean) are not allowed in the schema and are rejected at index creation. Metadata fields are auto-indexed for filtering at upsert time — see Metadata fields. -
read_capacity(object, optional) - Read capacity for serverless (managed) indexes:mode: "OnDemand"— default; auto-scaled shared read capacity.mode: "Dedicated"— provisioned read nodes. Requires adedicatedblock withnode_type,scaling, and (forManualscaling)manual: { shards, replicas }.
-
deletion_protection(string, optional) -"enabled"or"disabled"(default:"disabled"). -
tags(object, optional) - Key-value tags for the index.
- Field names must be unique within the schema.
- Field names must contain only alphanumeric characters and underscores, must not start with
_(reserved for system-managed fields like_idand_score) or$(reserved for filter operators), and must be at most 64 bytes. - The schema must contain at least one field.
full_text_search block returns the full resolved analyzer config: the settable subset (language, stemming, stop_words) reflects what was passed at index creation (or its default when omitted), and lowercase and max_token_length are server-applied defaults that aren’t settable from the request. All fields include description (null if not supplied at creation).Wait for status.ready: true before performing data plane operations. For Dedicated read capacity, also wait for read_capacity.status.state: "Ready".Response fields:id(string) — Unique index ID.name(string) — Index name.host(string) — Per-index host URL for data-plane operations (INDEX_HOST.svc.<region>.pinecone.io).status(object) — Provisioning status.ready(boolean) — Whether the index is ready for data-plane operations.state(string) — Current state, e.g.,"Initializing","Ready".
deployment(object) — Resolved deployment configuration.deployment_type(string) — e.g.,"managed".cloud(string) — Cloud provider.region(string) — Region code.environment(string) — Environment identifier assigned by the system.
schema(object) — Resolved schema with server-applied defaults.version(string) — Schema version, e.g.,"v1".fields(object) — Map of field name → resolved field definition. See note above onfull_text_searchserver-applied defaults.
read_capacity(object) — Resolved read capacity configuration.mode(string) —"OnDemand"or"Dedicated".dedicated(object, present whenmode: "Dedicated") — Dedicated read-node configuration:node_type,scaling, and (forManualscaling)manual.{ shards, replicas }.status(object) — Read-capacity provisioning status.state(string) — e.g.,"Migrating","Ready".current_shards(integer or null,Dedicatedonly) — Current number of provisioned shards.current_replicas(integer or null,Dedicatedonly) — Current number of provisioned replicas.
tags(object or null) — Key-value tags, ornullif none.deletion_protection(string) —"enabled"or"disabled".
List indexes (GET /indexes)
List indexes (GET /indexes)
Describe index (GET /indexes/{index_name})
Describe index (GET /indexes/{index_name})
Update index (PATCH /indexes/{index_name})
Update index (PATCH /indexes/{index_name})
deletion_protection can be updated.Delete index (DELETE /indexes/{index_name})
Delete index (DELETE /indexes/{index_name})
deletion_protection is enabled, you must first disable it using the update endpoint.Data plane operations
"__default__" if you don’t need partitioning. If your documents are in another namespace, search, fetch, and delete requests must target that namespace.Upsert documents (POST /namespaces/{namespace}/documents/upsert)
Upsert documents (POST /namespaces/{namespace}/documents/upsert)
_id exists, it is completely replaced. Documents are indexed asynchronously and may not be searchable immediately after upsert.Example requestnamespace(string, required) - Namespace name (use"__default__"if not using namespaces).
documents(array, required, 1-1000 items) - Array of documents to upsert. Each document is an object with:_id(string, required) - Unique document ID. If a document with this_idalready exists, it is replaced entirely. If multiple documents in the same batch share an_id, only the last one is stored.- Fields matching your schema. Additional fields are stored on the document and auto-indexed for filtering as metadata. Names starting with
_or$are rejected.
- Each upsert request can contain up to 1000 documents and must be no larger than 2 MB.
- Each document can be no larger than 2 MB.
- Each
full_text_searchstring field can be no larger than 100 KB and can contain up to 10,000 tokens. - Each token can be no larger than 256 bytes before analyzer truncation.
- Metadata fields on a document (everything outside FTS-enabled
stringfields) are limited to 40 KB per document in total. This metadata limit does not apply tofull_text_searchtext fields.
upserted_count(integer) - Number of documents accepted for upsert.
Schema validation
Each item in thedocuments array is validated against your index schema. If any item fails validation, the entire request fails and nothing is upserted.| Scenario | Result |
|---|---|
| Field value doesn’t match declared type (for schema-declared fields) | Error — request fails |
| Document or request exceeds a size or count limit | Error — request fails |
| Field not in schema | Stored on the document and auto-indexed for filtering as metadata |
Field name starts with _ or $ | Error — request fails |
| Schema field missing from item | OK — schema fields are optional unless stated otherwise |
Document missing _id | Error — request fails |
Search documents (POST /namespaces/{namespace}/documents/search)
Search documents (POST /namespaces/{namespace}/documents/search)
text), Lucene query syntax (query_string), dense vector similarity (dense_vector), or sparse vector similarity (sparse_vector). Optionally filter by field values before scoring.Example requestnamespace(string, required) - Namespace name (use"__default__"if not using namespaces).
include_fields(array, optional) - List of field names to return in results. Defaults to[]if omitted (ornull); each match then returns only_idand_scorewith no stored fields. Use["*"]to return all stored fields (including fields not declared in the schema). User metadata fields namedscoreare returned alongside the system-owned_scorematch score.score_by(array, required) - Array of scoring methods. A single search request ranks by one scoring type. Multi-field BM25 is supported: pass severaltextclauses (one per field) or use a singlequery_stringclause whose query targets multiple fields, and every contributing field weighs equally; there is no per-clause weight parameter. To combine BM25 ranking withdense_vectororsparse_vectorranking, restrict the dense (or sparse) search with a text-match filter on the lexical field ($match_phrase,$match_all,$match_any) or run separate searches and merge results client-side. Each item must be one of:type: "text"— BM25 token matching on a single text field. Multi-word queries use OR-style matching (case-insensitive). Phrase constraints are not supported here; usequery_stringwith quoted terms for exact-phrase ranking.field(string, required) — Name of a text-searchable field.query(string, required) — One or more words to search for.
type: "query_string"— Lucene query syntax. Supports boolean operators, phrase prefix matching, boosting, and cross-field queries. Cross-field queries are expressed inside the query string itself (e.g.title:(alpha) OR body:(beta)).query(string, required) — A Lucene query string (see query syntax reference). You can target a single field withfield:(clause)or combine fields with boolean operators, e.g.title:(alpha) OR body:(beta).fields(array of strings, optional) — Restrict the query to one or more text-searchable fields. When omitted, the query runs against all text-searchable fields in the index. A bare string ("fields": "body") is accepted as shorthand for a one-element array. The legacy singular spelling"field"is also accepted as an alias.
type: "dense_vector"— Dense vector similarity ranking. Requires adense_vectorfield in the schema.field(string, required) — Name of the dense-vector field to score against.values(array of floats, required) — Query vector.
type: "sparse_vector"— Sparse vector similarity ranking. Requires asparse_vectorfield in the schema.field(string, required) — Name of the sparse-vector field to score against.sparse_values(object, required) —{ "indices": [...], "values": [...] }.
top_k(integer, required) - Number of results to return (1-10000).filter(object, optional) - Filter conditions applied before scoring. Filter on any metadata field on your documents (auto-indexed at upsert time) or use the text match operators ($match_phrase,$match_all,$match_any) on FTS-enabledstringfields. Supports the filter operators below.
| Limit | Value | Description |
|---|---|---|
Max score_by clauses | 100 | Maximum number of clauses in the score_by array |
Max total score_by payload | 100 KB | Maximum encoded size of all score_by clauses combined |
| Max per-clause query size | 10 KB | Maximum size of the query string in a single text or query_string clause |
Filter operators
Filters are applied before the search runs. The search only considers documents that match the filter.| Operator | Example | Description |
|---|---|---|
$eq | {"category": {"$eq": "tech"}} | Equals |
$ne | {"category": {"$ne": "tech"}} | Not equals |
$gt | {"year": {"$gt": 2023}} | Greater than |
$gte | {"year": {"$gte": 2023}} | Greater than or equal |
$lt | {"year": {"$lt": 2025}} | Less than |
$lte | {"year": {"$lte": 2025}} | Less than or equal |
$in | {"category": {"$in": ["a", "b"]}} | In list |
$nin | {"category": {"$nin": ["a", "b"]}} | Not in list |
$exists | {"category": {"$exists": true}} | Field has a value (true) or is absent (false). |
$match_phrase | {"body": {"$match_phrase": "machine learning"}} | Exact phrase match (contiguous tokens) on a text-searchable field. Compose with any score_by type. |
$match_all | {"body": {"$match_all": "machine learning"}} | All tokens present, in any order, on a text-searchable field. |
$match_any | {"body": {"$match_any": "AI robotics"}} | At least one token present, on a text-searchable field. |
$and | {"$and": [{"category": {"$eq": "tech"}}, {"year": {"$gte": 2024}}]} | Logical AND of the listed clauses. |
$or | {"$or": [{"category": {"$eq": "tech"}}, {"category": {"$eq": "ai"}}]} | Logical OR of the listed clauses. |
$not | {"$not": {"category": {"$eq": "archive"}}} | Negation of the wrapped clause. |
filter object are combined with implicit AND semantics. Use $and, $or, and $not to build explicit compound conditions (they can nest).The text match operators ($match_phrase, $match_all, $match_any) share a few rules:- Where they apply. Fields declared with a
full_text_searchconfig object. - Tokenization. They reuse the field’s configured tokenizer and stemmer — a token that matches in BM25 scoring will match in a text match filter.
- Value limit. Each operator accepts at most 128 tokens in its value.
- Lucene-style operators. Phrase slop (
"phrase"~N), term boosting (^N), and phrase prefix ("phrase pre"*) are not parsed — values are literal text and match semantics come from the operator name. To use those operators, score withquery_string. - Composition. They compose freely with metadata operators under
$and,$or, and$notat any nesting level:
More examples
Token matching with filter:query_string:body contains both “federal” and “reserve”, then ranks those candidates by BM25 score against “monetary policy impact”.Phrase filter with negation:matches(array) - Ranked matches, most relevant first._id(string) - Document ID._score(float) - Relevance score (higher is better). The leading underscore prevents collision with user-defined metadata fields namedscore.- Plus any fields requested via
include_fields.
namespace(string) - Namespace searched.usage(object) -read_unitsconsumed.
Fetch documents (POST /namespaces/{namespace}/documents/fetch)
Fetch documents (POST /namespaces/{namespace}/documents/fetch)
filter parameter. To retrieve documents matching a metadata expression, use POST /namespaces/{namespace}/documents/search with a filter instead.Example request — fetch by idsids(array of strings, required, 1-1000 items) - Document IDs to fetch. Must contain at least one ID; an empty array returns a 400 error.include_fields(array of strings, optional) - Field names to include. If omitted, all fields are returned.
documents(object) - Map of document ID to the returned fields (including_id).namespace(string) - Namespace fetched from.usage(object) -read_unitsconsumed.
Delete documents (POST /namespaces/{namespace}/documents/delete)
Delete documents (POST /namespaces/{namespace}/documents/delete)
ids or delete_all. Delete does not accept a filter parameter — to delete documents matching a metadata expression, fetch their IDs via POST /namespaces/{namespace}/documents/search first, then pass them to delete.Example request — delete by idsids(array of strings, 1-1000 items) - Document IDs to delete.delete_all(boolean) - Iftrue, delete all documents in the namespace.
Python SDK
Installation
Full-text search is available in the standardpinecone Python SDK under the pc.preview.* namespace, which gates the alpha API surface. Make sure you have a recent version of the SDK installed.
pc.preview.* for control-plane operations and pc.preview.index(...).documents.* for data-plane document operations. The preview namespace makes the alpha status explicit and isolates FTS APIs from the GA pc.indexes.* and pc.index(...) namespaces used by the vector API.Control plane
Instantiate the client
Instantiate the client
Create index (on-demand read capacity)
Create index (on-demand read capacity)
Create index (dedicated read capacity)
Create index (dedicated read capacity)
Describe index
Describe index
List indexes
List indexes
Check whether an index exists
Check whether an index exists
Update index configuration
Update index configuration
configure to update mutable settings on an existing index (for example, deletion protection or index tags). Schema changes are not supported in public preview.Delete index
Delete index
Data plane
Build a data plane client
Build a data plane client
Upsert documents
Upsert documents
Search — token match (type: text)
Search — token match (type: text)
Search — Lucene query string (type: query_string)
Search — Lucene query string (type: query_string)
Search — dense vector ranking with phrase-match filter
Search — dense vector ranking with phrase-match filter
Fetch documents
Fetch documents
Delete documents
Delete documents
delete_all) — it does not accept a filter. To delete documents matching a metadata expression, search first to get IDs, then pass them to delete.Tokens and analyzers
The word “token” appears in every scoring method, but it means different things in each. Knowing what counts as a token in your chosen method is essential to writing queries that match what you expect.FTS tokens (type: "text", type: "query_string", and $match_* filters)
When you declare a field with full_text_search: { ... }, Pinecone runs the field’s text through an analyzer pipeline at index time and at query time. Both type: "text" and type: "query_string" use the same pipeline, and the text-match filter operators ($match_phrase, $match_all, $match_any) reuse it as well — so a token that scores in BM25 will match in a filter on the same field.
The pipeline (in order):
- Split the text on whitespace and punctuation. Hyphenated words become multiple tokens (
state-of-the-art→state,of,the,art). - Lowercase every token. Lowercasing is server-applied and cannot be overridden.
- Stem each token to its root form, if
stemmingis enabled on the field. The stemmer is selected by the field’slanguagesetting (models→model,running→run). - Drop stop words (common words like
the,and), ifstop_words: trueis set on the field. Not all languages have built-in stop word lists; see the Language table for details. - Cap each token at 40 characters. This cap is server-applied and cannot be overridden.
english analyzer, stemming: true, and stop_words: false, the input "State-of-the-Art Models" becomes the tokens state, of, the, art, model. Those are the tokens BM25 scores against, and the tokens a $match_phrase: "art models" filter will look for.
Dense-vector tokens (type: "dense_vector")
Dense embedding models have their own internal tokenizer — usually a subword scheme like BPE, WordPiece, or SentencePiece — that breaks text into pieces the model was trained on. Those tokens are private to the model. You never query them directly: a dense search compares the full embedding of a query against the full embedding of a document. The same string can therefore behave very differently in type: "text" (which sees the FTS analyzer tokens above) and type: "dense_vector" (which sees a single high-dimensional vector). The $match_* filter operators do not apply to dense-vector fields.
Sparse-vector tokens (type: "sparse_vector")
Sparse encoders also tokenize internally, and the tokenization depends on the encoder. Pinecone’s hosted pinecone-sparse-english-v0 produces learned per-token weights and expands to related terms that don’t appear in the source text. Encoder tokens are not interchangeable with FTS analyzer tokens, and $match_* filters do not apply to sparse-vector fields.
Practical implication
If your application stores the same source text in an FTS-enabledstring field and also encodes it into a dense_vector or sparse_vector field, the three representations are tokenized independently: the FTS analyzer for the string field, and each model’s internal tokenizer for the vector fields. Identical query strings will therefore retrieve different documents under different score_by types, and $match_* filters can only narrow on the FTS-analyzer tokens of FTS-enabled string fields.
Query syntax reference
Full-text search supports two text-based query types with different capabilities:| Feature | type: "text" | type: "query_string" |
|---|---|---|
| Purpose | Simple token search on one field | Lucene query syntax |
fields parameter | Required (exactly one field) | Optional (restricts to listed text-searchable fields) |
| Multi-word behavior | Token match, OR across terms (BM25) | OR by default; use AND, quotes, etc. for other logic |
| Boolean operators | Not supported (treated as words) | AND, OR, NOT, +, - |
| Phrase prefix | Not supported | "phrase pre"* (last term as prefix) |
| Phrase matching | Not supported in score_by (use query_string or $match_phrase filter) | Wrap in quotes: "exact phrase" |
| Phrase slop | Not supported | "phrase"~N |
| Boosting | Not supported | term^N |
| Regex | Not supported | field:/pattern.*/ |
| Stemming | Supported (when enabled) | Supported (when enabled) |
| Case sensitivity | Case-insensitive | Case-insensitive |
Token matching (type: "text")
With type: "text", the query string is run through the field’s analyzer pipeline (see Tokens and analyzers) and each resulting term contributes to the BM25 score. Multiple terms use OR semantics: documents can match if they contain any of the terms; documents that match more terms or stronger term statistics typically rank higher. Matching is case-insensitive. Exact phrase constraints (adjacent words in order) belong in type: "query_string" using quotes, or in a $match_phrase filter.
| Query | Matches | Does not match |
|---|---|---|
machine learning | ”Machine learning is great” (has “machine”) | “Vector databases only” (neither term) |
machine learning | ”We use learning and machine” (both terms present, any order) | “Vector databases only” (neither term) |
machine | ”Machine learning is great" | "Vector databases only” (no “machine”) |
- Single term (
machine): Matches documents containing that term. Case-insensitive. - Multiple terms (
machine learning): Each term is searched independently with OR-style matching and combined BM25 scoring — not as a single adjacent phrase. - No operator support: Characters like
AND,OR,NOT,*,~,^,+,-, and quotes are treated as literal text.
Lucene query syntax (type: "query_string")
With type: "query_string", you write Lucene query syntax, with operator support. Field names are embedded in the query itself (e.g., content:(term)) and can combine multiple fields with boolean operators.
| Operator | Syntax | Example | Description |
|---|---|---|---|
| Term | field:(word) | body:(computers) | Match documents containing term |
| Multiple terms | field:(a b) | body:(machine learning) | OR by default — matches either term |
| Phrase | field:("words") | body:("machine learning") | Exact phrase match (adjacent, in order) |
| AND | AND | body:(a AND b) | Both terms required |
| OR | OR | body:(a OR b) | Either term matches (same as default) |
| NOT | NOT | body:(a NOT b) | Exclude second term |
| Required | +term | body:(+database search) | Term must be present |
| Excluded | -term | body:(database -deprecated) | Term must not be present |
| Grouping | (expr) | body:((a OR b) AND c) | Control precedence |
| Phrase slop | "phrase"~N | body:("fast search"~2) | Allow up to N words between phrase terms |
| Boost | term^N | body:(machine^3 learning) | Multiply term’s relevance score by N |
| Phrase prefix | "phrase pre"* | body:("james w"*) | Last term in phrase matched as prefix |
| Regex | field:/pattern.*/ | body:/comput.*/ | Match documents by regular expression on a field |
| Cross-field | fieldA:(…) OR fieldB:(…) | title:(quantum) OR body:(machine) | Combine clauses across text-searchable fields |
Terms and default OR behavior
Terms and default OR behavior
Phrases
Phrases
type: "text" with query: "machine learning", which uses token OR matching on the field. For phrase matching as a filter (e.g., composed with dense-vector ranking), use {"body": {"$match_phrase": "machine learning"}} in the filter block.Phrase terms are matched against the field’s analyzed tokens. If stemming is enabled on the field, the phrase terms stem too — e.g., "running fast" matches running fast and runs fast.Boolean operators (AND, OR, NOT)
Boolean operators (AND, OR, NOT)
AND, OR, and NOT for explicit boolean logic.Required and excluded terms (+, -)
Required and excluded terms (+, -)
+ to require a term and - to exclude a term.Phrase proximity (slop)
Phrase proximity (slop)
Term boosting
Term boosting
^N.Phrase prefix
Phrase prefix
* to a quoted phrase to treat the last term as a prefix. The phrase must contain at least two terms."new yor"* can match new york, but "new yo"* might not if york is not among the first 50 expanded terms for yo.Regex
Regex
body field contains a token matching the regex comput.* (e.g., “computer”, “computing”, “computation”). Regex patterns are matched against individual analyzed tokens, not the raw field text.type: "query_string". It is not supported with type: "text".Cross-field queries
Cross-field queries
query_string can target multiple fields in the same expression. Omit the fields array in score_by to run against all text-searchable fields, or list specific fields to restrict the scope:title contains “quantum”, documents whose body contains “machine” or “learning”, or both — with BM25 scoring combining across fields.Stemming
Stemming reduces words to their root form so that morphological variants match each other. For example, with stemming enabled, a query for “run” also matches documents containing “running” or “runs”. Stemming is opt-in and disabled by default. To enable it, setstemming: true on a text-searchable field when creating the index. The stemming algorithm is determined by the field’s language setting.
Example: enabling stemming with French
type: "text" and type: "query_string" queries on the field.
Language
Thelanguage parameter controls tokenization and stemming behavior for a text-searchable field. It determines how text is analyzed during indexing and search: how words are split into tokens and, when stemming is enabled, which language-specific rules are used to reduce words to their root forms.
The default language is "en" (English). You can specify a language using either its short code or full name (e.g., "fr" or "french").
Supported languages:
| Code | Full name | Stop words |
|---|---|---|
ar | arabic | No |
da | danish | Yes |
de | german | Yes |
el | greek | No |
en | english | Yes |
es | spanish | Yes |
fi | finnish | Yes |
fr | french | Yes |
hu | hungarian | Yes |
it | italian | Yes |
nl | dutch | Yes |
no | norwegian | Yes |
pt | portuguese | Yes |
ro | romanian | No |
ru | russian | Yes |
sv | swedish | Yes |
ta | tamil | No |
tr | turkish | No |
Troubleshooting
Document not appearing in search results
Document not appearing in search results
- Check indexing latency: new documents may take up to 1 minute to become searchable; schemas with multiple indexed fields may take slightly longer.
- Verify the upsert response shows the expected
upserted_count. - Confirm you’re searching the same namespace where you upserted.
- With
type: "text", multi-word queries use token OR matching — documents need not contain the full phrase. Try a single-term query first to confirm the document is searchable. - If using filters, ensure the document’s field values match your filter conditions. Metadata fields are auto-indexed at upsert time, so any field present on a document can be filtered on; filtering on a field that no document contains returns no results.
Unexpected search results
Unexpected search results
type: "text"uses OR across terms.machine learningmatches documents that contain “machine”, “learning”, or both (BM25 ranking). For an exact phrase, usetype: "query_string"withbody:("machine learning")or a$match_phrasefilter.type: "query_string"defaults to OR for unquoted terms.body:(machine learning)matches documents containing either term. UseANDor+for required terms.- Operators like
AND,OR,NOT,*,~, and^only work withtype: "query_string". Withtype: "text", they are treated as literal words.
Query syntax errors
Query syntax errors
type: "query_string". With type: "text", any input is valid as a literal string to be tokenized.- Unmatched quotes (
"machine learning): Close all quotes. - Empty query: Provide at least one search term.
- Invalid boolean syntax (
AND machine): Operators need terms on both sides. - Unbalanced parentheses: Match all opening and closing parens.
- Unknown field name: Field names in the query must match text-searchable fields in the schema.
API errors
API errors
401 Unauthorized: Check theApi-Keyheader.400 Bad Request: Check JSON syntax and required fields. Examples:fieldsarray with more than one element fortext/dense_vector/sparse_vector; missing mutually-exclusive field for Fetch/Delete.404 Not Found: Verify the index name and host URL.- Missing API version: Add
X-Pinecone-Api-Version: 2026-01.alpha.
Upsert errors
Upsert errors
- Type mismatch: Ensure values match declared schema types.
- Invalid
_id: Every document must have a non-empty_idstring. - Reserved names: Field names cannot start with
_(reserved for system-managed fields like_idand_score) or$(reserved for filter operators), and must be at most 64 bytes.
Slow search performance
Slow search performance
- Reduce query complexity: Boolean operators and large phrase slop are more expensive than simple term queries.
- Simplify filters: Filters are applied before scoring, so broad filters increase the search space.
- For cost-sensitive workloads, use
read_capacity.mode: "Dedicated"to get predictable latency.
Common request-shape pitfalls
Common request-shape pitfalls
-
Sparse-vector
score_byclauses usesparse_values, notvalues. Thevalueskey is fordense_vector. A sparse clause needs the full object:"sparse_values": { "indices": [...], "values": [...] }. -
Every
score_byclause must includetype. It’s the discriminator that selects the scoring method (text,query_string,dense_vector,sparse_vector). Omitting it returns a 400. -
Every document must have a non-empty
_idstring. There is no default; the upsert request fails if any document in the batch is missing_idor has an empty value. -
Wait for
status.ready: truebefore searching. A newly created index can briefly return empty results. ForDedicatedread capacity, also wait forread_capacity.status.state: "Ready". -
The match-score response field is
_score, notscore. A user metadata field namedscoreis allowed and is returned alongside the system-owned_score. -
Namespace is part of the URL path. Use
__default__(the literal string) if you don’t need partitioning. An empty path segment is rejected. -
dense_vectorqueries usevalues, notquery. Onlytextandquery_stringclauses usequery(a string).dense_vectorandsparse_vectorusevalues(a float array) andsparse_values(an{indices, values}object) respectively.
Public preview
Full-text search is in public preview under API version2026-01.alpha. The feature is ready for production evaluation; APIs may continue to evolve before general availability.
Requirements & limitations
- All requests require
X-Pinecone-Api-Version: 2026-01.alpha. - The REST API, Python SDK (
pinecone,pc.preview.*namespace for FTS control plane), and Pinecone console are the supported entry points for public preview. - Endpoint compatibility: indexes with document schemas use the
/namespaces/{namespace}/documents/*endpoints; dense, sparse, and integrated-inference indexes continue to use/vectors/*(and/records/*for integrated inference). The two endpoint families are index-type-specific and don’t cross over. - Supported deployment modes: managed (serverless) with
read_capacity.modeofOnDemandorDedicated. - Changing an index from dedicated read capacity back to on-demand read capacity is not supported. To move from dedicated read capacity to on-demand, create a new on-demand index and reingest your data.
- Schemas declare ranking fields only: text fields (
stringwithfull_text_search),dense_vector, andsparse_vector. Text-only, text + dense vector, and combined dense + sparse + text schemas are all supported in a single index. Metadata-only field declarations (stringwithoutfull_text_search,string_list,float,boolean) are rejected at index creation; metadata is auto-indexed at upsert time. - Schema and document limits: a schema can contain up to 100
full_text_searchstring fields; eachfull_text_searchstring field can be up to 100 KB and 10,000 tokens; tokens can be up to 256 bytes before analyzer truncation; each document can be up to 2 MB; each upsert request can contain up to 1000 documents and 2 MB. - Metadata size: metadata fields on a document (everything outside FTS-enabled
stringfields) are limited to 40 KB per document in total. This limit does not apply tofull_text_searchtext fields. - Vector-field cardinality: a schema can declare up to 100
stringfields withfull_text_searchenabled, but at most onedense_vectorfield and at most onesparse_vectorfield per index. - Field-name policy: schema and metadata field names must not start with
_(reserved for system-managed fields like_idand_score) or$(reserved for filter operators), and are limited to 64 bytes. - The match-score response field is
_score(renamed fromscoreso that user metadata namedscorecan coexist with the system-owned match score in the flat response payload). - A single search request ranks by one scoring type. Multi-field BM25 is supported: pass multiple
textclauses (one per field) or a singlequery_stringclause that targets several fields — every contributing field weighs equally in2026-01.alpha; there is no per-clause weight parameter. To combine BM25 ranking withdense_vectororsparse_vectorranking, restrict the dense (or sparse) search with a text-match filter ($match_phrase,$match_all,$match_any) on the lexical field, or run separate searches and merge the results client-side. - Newly upserted documents are indexed asynchronously and may not be searchable immediately.
- No partial / per-field updates:
POST /namespaces/{namespace}/documents/upsertalways replaces the entire document for a given_id. There is noPATCHendpoint and no field-level merge in2026-01.alpha. To update a single field, fetch the document by ID (POST /namespaces/{namespace}/documents/fetch), modify the field client-side, and upsert the full document back under the same_id. Field-level merge is on the roadmap for a post-public-preview release. - Schemas are fixed at index creation. Adding, removing, or retyping fields after creation is not yet supported. Existing pre-public-preview indexes cannot be backfilled with a schema — to use FTS, dense + FTS, or any document API query in
2026-01.alpha, create a new index with the desired schema and reindex documents. - Metadata is auto-indexed: any field on an upserted document that is not declared in the schema is automatically indexed for filtering. The schema declares only ranking fields (FTS-enabled
string,dense_vector,sparse_vector); declaring metadata-only fields (stringwithoutfull_text_search,string_list,float,boolean) is rejected at index creation. Track metadata field names and types in your application — Pinecone infers the type from the values you upsert. - Bulk import (S3 import job) is not yet supported for indexes with document schemas; load documents through
POST /namespaces/{namespace}/documents/upsert. - Maximum results per query:
top_kis capped at 10,000. Full-text search is optimized for ranked retrieval; for aggregation- or count-style queries (e.g., “how many documents contain term X”), faceting is on the roadmap for a future release. - Indexes cannot be created in CMEK-enabled projects.
- Backup and restore are not yet supported.
describe_index_statsand namespace management endpoints (POST /namespaces,GET /namespaces,GET /namespaces/{namespace},DELETE /namespaces/{namespace}) are not yet supported on indexes with document schemas. Namespaces on these indexes are still auto-created on first upsert.- Fuzzy matching is not yet supported.
- Single-term prefix wildcards (
auto*) are not supported; use phrase prefix ("word auto"*) instead.