type: "text") on a single field, and Lucene query syntax (type: "query_string") for boolean operators, exact phrases, phrase prefix matching, boosting, and more.
For type: "text", multiple terms use OR-style matching with BM25 scoring.
Results are ranked by relevance using the BM25 algorithm.
Schema definition
Full-text search indexes require an explicit schema. During early access, the schema at index creation declares only full-text searchable fields — that is, fields withfull_text_searchable: true.
When you create the index, the schema tells Pinecone which fields contain text you want to search (full_text_searchable: true). During early access, only these full-text searchable fields may be included.
content field (a full create-index request also includes read_capacity and other top-level fields):
stringwithfull_text_searchable: true- Text data matched by keyword queries. This is the only type of field allowed in the schema during early access.
category, year) in your documents at upsert time; they are stored and filterable, but they are not part of the index schema. For example, you can search the content field for “machine learning” and filter on category = "technology" using fields you sent in the document.
API
Full-text search uses API version2026-01.alpha. All requests require the header X-Pinecone-API-Version: 2026-01.alpha.
Control plane operations
Control plane operations are used to manage indexes and their configuration.Create index (POST /indexes)
Create index (POST /indexes)
b1 shard and replica.name(string, required) - Unique index name (lowercase alphanumeric and hyphens)deployment(object, required) - Deployment configurationdeployment_type(string, required) - Must be"managed"for serverlesscloud(string, required) - Cloud provider:"aws"or"gcp"region(string, required) - Region code (e.g.,"us-east-1")
schema(object, required) - Schema definitionfields(object, required) - Map of field names to field definitions. During early access, the schema may only contain full-text searchable fields:- Full-text searchable field - Enables full-text search
type: "string"(required)full_text_searchable: true(required)language(string, optional) - Language for tokenization and, whenstemmingis enabled, for stemming (default:"en"). Accepts short codes or full names (e.g.,"fr"or"french"). See Language for the full list of supported languages.stemming(boolean, optional) - Whether to enable language-based stemming (default:false). See Stemming.
- Full-text searchable fields support an optional
descriptionfor documenting what the field contains. This is especially useful for agentic workflows where an LLM inspects the schema to understand how to query the index.
- Full-text searchable field - Enables full-text search
read_capacity(object, required) - Read capacity configuration.mode(string, required) - Must be"Dedicated"for full-text search indexesdedicated(object, required) - Dedicated read node configurationnode_type(string, required) - Node type (e.g.,"b1")scaling(string, required) - Scaling mode:"Manual"manual(object, required) - Manual scaling configurationshards(integer, required) - Number of shards (minimum1)replicas(integer, required) - Number of replicas (minimum1)
deletion_protection(string, optional) -"enabled"or"disabled"(default:"disabled")tags(object, optional) - Key-value tags for the index
- During early access, the schema may only contain fields with
full_text_searchable: true(full-text searchable fields). - Field names must be unique within the schema
- Field names must contain only alphanumeric characters and underscores
- The schema must contain at least one field
- Only one field can have
full_text_searchable: true(multiple text fields not yet supported)
id(string) - Unique index IDname(string) - Index namehost(string) - Index host URL for data plane operationsstatus(object) - Index statusready(boolean) - Whether the index is ready for operationsstate(string) - Current state:"Initializing","Ready", etc.
deployment(object) - Deployment configurationdeployment_type(string) - Deployment type (e.g.,"managed")cloud(string) - Cloud providerregion(string) - Region codeenvironment(string) - Environment identifier assigned by the system
schema(object) - Schema definitionversion(string) - Schema version (e.g.,"v1")fields(object) - Field definitions with server-applied defaults. Full-text searchable fields include additional properties:language,stemming(defaults tofalse; set totrueat creation to enable),lowercase,max_term_len. All fields includedescription(null if not set).
read_capacity(object) - Read capacity configurationmode(string) - Read capacity mode (e.g.,"Dedicated")dedicated(object) - Dedicated read node configurationnode_type(string) - Node type (e.g.,"b1")scaling(string) - Scaling mode (e.g.,"Manual")manual(object) - Manual scaling configurationshards(integer) - Number of shardsreplicas(integer) - Number of replicas
status(object) - Current status of read capacity provisioningstate(string) - Provisioning state (e.g.,"Migrating","Ready")current_shards(integer or null) - Current number of shardscurrent_replicas(integer or null) - Current number of replicas
tags(object or null) - Key-value tags, or null if none setdeletion_protection(string) - Deletion protection status
status.ready: true before performing data plane operations.List indexes (GET /indexes)
List indexes (GET /indexes)
Describe index (GET /indexes/{index_name})
Describe index (GET /indexes/{index_name})
index_name(string, required) - Name of the index
Update index (PATCH /indexes/{index_name})
Update index (PATCH /indexes/{index_name})
deletion_protection can be updated.Example requestindex_name(string, required) - Name of the index
deletion_protection(string, optional) -"enabled"or"disabled"
Delete index (DELETE /indexes/{index_name})
Delete index (DELETE /indexes/{index_name})
deletion_protection is enabled, you must first disable it using the update endpoint.Example requestindex_name(string, required) - Name of the index
Data plane operations
"__default__" if you don’t need partitioning.Upsert documents (POST /namespaces/{namespace}/documents/upsert)
Upsert documents (POST /namespaces/{namespace}/documents/upsert)
_id exists, it is completely replaced. Documents become searchable within approximately one minute.Example requestnamespace(string, required) - Namespace name (use"__default__"if not using namespaces)
documents(array, required) - Array of documents to upsert. Each document is an object with:_id(string, required) - Unique document ID. If a document with this_idalready exists, it is replaced entirely. If multiple documents in the same batch share an_id, only the last one is stored.- Fields matching your schema (the
full_text_searchablefield must be present)
upserted_count(integer) - Number of documents upserted
Schema validation
Each item in thedocuments array is validated against your index schema. If any item fails validation, the entire request fails and nothing is upserted.| Scenario | Result |
|---|---|
| Field value doesn’t match declared type | Error - request fails |
| Field not in schema | Stored and filterable, but not added to the schema |
| Schema field missing from item | OK - fields are optional |
| Text-searchable field is missing | Error - request fails |
| Text contains Unicode or special characters | OK - fully supported |
"Document with id 'doc-1': boolean field 'in_stock' must be a boolean""Each document must have at least one indexable field"
Search documents (POST /namespaces/{namespace}/documents/search)
Search documents (POST /namespaces/{namespace}/documents/search)
type: "text") or Lucene query syntax (type: "query_string"). Optionally filter by field values before scoring. Results are ranked by BM25 relevance score.Example requestnamespace(string, required) - Namespace name (use"__default__"if not using namespaces)
include_fields(array, required) - List of field names to return in resultsscore_by(array, required) - Array of scoring methods. Each item must be one of:type: "text"- Simple token-based matching on a specific field. The query is split into terms; documents are scored with BM25 using OR semantics across terms (similar to Elasticsearch/OpenSearchmatch). Phrase constraints are not supported; usetype: "query_string"with quoted terms for exact-phrase ranking. Case-insensitive.field(string, required) - Name of the searchable field.query(string, required) - One or more words to search for. Multiple words are separate terms, not a single exact phrase. Query syntax operators (AND, OR, NOT, etc.) are not supported; they are treated as literal words.
type: "query_string"- Lucene query syntax. Supports boolean operators, phrase prefix matching, boosting, and more. Do not specifyfield; instead, target fields within the query itself.query(string, required) - A Lucene query string in the form<field_name>:(<query clause>)(see query syntax reference).
top_k(integer, required) - Number of results to return (1-10000)filter(object, optional) - Filter conditions on filterable fields (applied before search)
type: "text" and type: "query_string":- Use
type: "text"when you want straightforward keyword search on one field without Lucene syntax—just the field name and a plain string. Multi-word queries match documents containing any of the terms (BM25 ranks by relevance). - Use
type: "query_string"when you need an exact phrase in the ranking ("like this"), boolean operators, phrase prefix matching, boosting, slop, or field-specific clauses. See the query syntax reference for the full list of operators.
| Operator | Example | Description |
|---|---|---|
$eq | {"category": {"$eq": "tech"}} | Equals |
$ne | {"category": {"$ne": "tech"}} | Not equals |
$gt | {"year": {"$gt": 2023}} | Greater than |
$gte | {"year": {"$gte": 2023}} | Greater than or equal |
$lt | {"year": {"$lt": 2025}} | Less than |
$lte | {"year": {"$lte": 2025}} | Less than or equal |
$in | {"category": {"$in": ["a", "b"]}} | In list |
$nin | {"category": {"$nin": ["a", "b"]}} | Not in list |
content scores for the terms “machine” and “learning” (OR-style token match with BM25), filtered to technology articles from 2024 onward. Documents that contain both terms typically rank higher than those with only one.Boolean query with query_string:content field contains both “machine” and “learning” (in any order, not necessarily adjacent).Boosting with query_string:^, AND, and OR only work with type: "query_string".OR query (default behavior) with query_string:type: "query_string", multiple terms default to OR: this matches documents containing “quick”, “brown”, or “fox” (or any combination). Documents matching more terms rank higher.Example responseStatus: 200 OKmatches(array) - Array of matching documentsid(string) - Document IDscore(float) - BM25 relevance score (higher is better){field}(any) - Requested fields from the document
namespace(string) - Namespace searchedusage(object) - Usage informationread_units(integer) - Read units consumed
Python SDK
Installation
To use full-text search with Pinecone’s Python SDK, you need to install theoakcylinder package — an early access version of the SDK that is in active development.
Control plane
Instantiate the client
Instantiate the client
Create index
Create index
Describe index
Describe index
Delete index
Delete index
Data plane
Build a data plane client
Build a data plane client
Upsert documents
Upsert documents
Search documents — token match (type: text)
Search documents — token match (type: text)
Search documents — Lucene query string (type: query_string)
Search documents — Lucene query string (type: query_string)
Query syntax reference
Full-text search supports two query types with different capabilities:| Feature | type: "text" | type: "query_string" |
|---|---|---|
| Purpose | Simple token search on one field | Lucene query syntax |
field parameter | Required | Not allowed (field names go in query) |
| Multi-word behavior | Token match, OR across terms (BM25) | OR by default; use AND, quotes, etc. for other logic |
| Boolean operators | Not supported (treated as words) | AND, OR, NOT, +, - |
| Phrase prefix | Not supported | "phrase pre"* (last term as prefix) |
| Phrase matching | Not supported in score_by (use query_string) | Wrap in quotes: "exact phrase" |
| Phrase slop | Not supported | "phrase"~N |
| Boosting | Not supported | term^N |
| Stemming | Supported (when enabled) | Supported (when enabled) |
| Case sensitivity | Case-insensitive | Case-insensitive |
Token matching (type: "text")
With type: "text", the query string is tokenized. Each term contributes to the BM25 score. Multiple terms use OR semantics: documents can match if they contain any of the terms; documents that match more terms or stronger term statistics typically rank higher. Matching is case-insensitive. Exact phrase constraints (adjacent words in order) belong in type: "query_string" using quotes, not in type: "text".
| Query | Matches | Does not match |
|---|---|---|
machine learning | ”Machine learning is great” (has “machine”) | “Vector databases only” (neither term) |
machine learning | ”We use learning and machine” (both terms present, any order) | “Vector databases only” (neither term) |
machine | ”Machine learning is great" | "Vector databases only” (no “machine”) |
- Single term (
machine): Matches documents containing that term. Case-insensitive. - Multiple terms (
machine learning): Each term is searched independently with OR-style matching and combined BM25 scoring—not as a single adjacent phrase. - Tokenization: Text is split into tokens on whitespace and punctuation. Hyphenated words become multiple tokens (e.g., “state-of-the-art” yields
state,of,the,art). - No stop words: Common words like “the”, “a”, “of”, and “is” are not removed during indexing or search. All tokens are indexed and searchable.
- No operator support: Characters like
AND,OR,NOT,*,~,^,+,-, and quotes are treated as literal text, not operators. For example,machine AND learningis tokenized as literal words, not boolean AND.
type: "query_string" instead.Lucene query syntax (type: "query_string")
With type: "query_string", you write Lucene query syntax, with operator support. Field names are embedded in the query itself (e.g., content:(term)).
| Operator | Syntax | Example | Description |
|---|---|---|---|
| Term | field:(word) | content:(computers) | Match documents containing term |
| Multiple terms | field:(a b) | content:(machine learning) | OR by default — matches either term |
| Phrase | field:("words") | content:("machine learning") | Exact phrase match (adjacent, in order) |
| AND | AND | content:(a AND b) | Both terms required |
| OR | OR | content:(a OR b) | Either term matches (same as default) |
| NOT | NOT | content:(a NOT b) | Exclude second term |
| Required | +term | content:(+database search) | Term must be present |
| Excluded | -term | content:(database -deprecated) | Term must not be present |
| Grouping | (expr) | content:((a OR b) AND c) | Control precedence |
| Phrase slop | "phrase"~N | content:("fast search"~2) | Allow up to N words between phrase terms |
| Boost | term^N | content:(machine^3 learning) | Multiply term’s relevance score by N |
| Phrase prefix | "phrase pre"* | content:("james w"*) | Last term in phrase matched as prefix |
Terms and default OR behavior
Terms and default OR behavior
Phrases
Phrases
type: "text" with query: "machine learning", which uses token OR matching on the field. For OR-style matching in query_string, use unquoted terms: content:(machine learning).Boolean operators (AND, OR, NOT)
Boolean operators (AND, OR, NOT)
AND, OR, and NOT for explicit boolean logic.Required and excluded terms (+, -)
Required and excluded terms (+, -)
+ to require a term and - to exclude a term.Phrase proximity (slop)
Phrase proximity (slop)
Term boosting
Term boosting
^N.Phrase prefix
Phrase prefix
* to a quoted phrase to treat the last term as a prefix. The phrase must contain at least two terms.auto* are not supported. The phrase must contain at least two terms.Combining operators
Combining operators
Stemming
Stemming reduces words to their root form so that morphological variants match each other. For example, with stemming enabled, a query for “run” also matches documents containing “running” or “runs”. Stemming is opt-in and disabled by default. To enable it, setstemming: true on a full-text searchable field when creating the index. The stemming algorithm is determined by the field’s language setting.
Example: enabling stemming with French
type: "text" and type: "query_string" queries on the field.
With stemming disabled (default):
| Query | Matches | Does not match |
|---|---|---|
run | ”run" | "running”, “runs”, “ran” |
machines | ”machines" | "machine” |
| Query | Matches | Does not match |
|---|---|---|
run | ”run”, “running”, “runs" | "ran” (irregular form) |
machines | ”machines”, “machine" | "database” |
Language
Thelanguage parameter controls tokenization and stemming behavior for a full-text searchable field. It determines how text is analyzed during indexing and search: how words are split into tokens and, when stemming is enabled, which language-specific rules are used to reduce words to their root forms.
The default language is "en" (English). You can specify a language using either its short code or full name (e.g., "fr" or "french"). See the Stemming section for an example using "language": "french".
Supported languages:
| Code | Full name |
|---|---|
ar | arabic |
da | danish |
de | german |
el | greek |
en | english |
es | spanish |
fi | finnish |
fr | french |
hu | hungarian |
it | italian |
nl | dutch |
no | norwegian |
pt | portuguese |
ro | romanian |
ru | russian |
sv | swedish |
ta | tamil |
tr | turkish |
Troubleshooting
Document not appearing in search results
Document not appearing in search results
- Check indexing latency: New documents may take up to 1 minute to become searchable.
- Verify the upsert response shows the expected
upserted_count. - Confirm you’re searching the same namespace where you upserted.
- With
type: "text", multi-word queries use token OR matching—documents need not contain the full phrase. Try a single-term query first to confirm the document is searchable. Terms are case-insensitive and must match exactly after tokenization unless stemming is enabled. - If using filters, ensure the document’s field values match your filter conditions.
Unexpected search results
Unexpected search results
type: "text"uses OR across terms.machine learningmatches documents that contain “machine”, “learning”, or both (BM25 ranking). For an exact phrase in the ranking, usetype: "query_string"withcontent:("machine learning").type: "query_string"defaults to OR for unquoted terms.content:(machine learning)matches documents containing either term. UseANDor+for required terms.- Operators like
AND,OR,NOT,*,~, and^only work withtype: "query_string". Withtype: "text", they are treated as literal words. - Searches match exact terms (after tokenization and lowercasing) unless stemming is enabled on the field. With stemming, morphological variants of a word will match.
- If using
^Nboosting (inquery_stringmode), higher-boosted terms significantly affect ranking.
Query syntax errors
Query syntax errors
type: "query_string". With type: "text", any input is valid as a literal string to be tokenized (no Lucene parsing).- Unmatched quotes (
"machine learning): Close all quotes. - Empty query: Provide at least one search term.
- Invalid boolean syntax (
AND machine): Operators need terms on both sides. - Unbalanced parentheses: Match all opening and closing parens.
- Unknown field name: Field names in the query must match
full_text_searchablefields in the schema.
API errors
API errors
401 Unauthorized: Check theApi-Keyheader.400 Bad Request: Check JSON syntax and required fields.404 Not Found: Verify the index name and host URL.- Missing API version: Add
X-Pinecone-API-Version: 2026-01.alpha.
Upsert errors
Upsert errors
- Type mismatch: Ensure values match declared schema types.
- Missing text content: The text-searchable field must be present in the document.
- Invalid
_id: Every document must have a non-empty_idstring.
Slow search performance
Slow search performance
- Reduce query complexity: Boolean operators are more expensive than simple term queries.
- Simplify filters: Filters are applied before search, so broad filters increase the search space.
- Check document count and size: Larger datasets may have higher latency.
Early access
Full-text search is in early access under API version2026-01.alpha. The feature is functional and ready for evaluation, but APIs may evolve based on feedback before general availability.
Requirements & limitations
- All requests require
X-Pinecone-API-Version: 2026-01.alpha - REST API & Python SDK only
- FTS requires a dedicated index created with the
2026-01.alphaAPI - During early access, full-text search indexes must be created using dedicated read nodes (
read_capacity.mode: "Dedicated"), using a singleb1node - Max document size: ~500 KB
- Insert-to-searchable latency: < 1 minute
- One text-searchable field per index
- No document fetch or delete endpoints yet
- No partial updates (upsert replaces the entire document)
- Text search only in this API version (
2026-01.alpha) - Hybrid search and text pre-filtering not yet available
- Indexes cannot be created in CMEK-enabled projects
- Backup and restore not supported
- Fuzzy matching and regex search not yet supported
- Single-term prefix wildcards (
auto*) not supported; use phrase prefix ("word auto"*) instead