type: "text") for matching exact phrases on a field, and Lucene query syntax (type: "query_string") supporting boolean operators, phrase prefix matching, boosting, and more. Results are ranked by relevance using the BM25 algorithm.
Schema definition
Full-text search indexes require an explicit schema that declares each field’s type and behavior: which fields are searchable, which are filterable, and what data types to expect. When you create the index, the schema tells Pinecone:- Which fields contain text you want to search (
full_text_searchable: true) - Which fields can be used as filters (
filterable: true) - What data types to expect (for validation)
content field and a filterable category field:
string- Text data. Can befull_text_searchable(matched by keyword queries) orfilterable(narrowed by exact-value conditions like$eq,$in).float- Numeric data. Can befilterable.boolean- True/false values. Can befilterable.
content field for “machine learning” while filtering on category = "technology".
API
Full-text search uses API version2026-01.alpha. All requests require the header X-Pinecone-API-Version: 2026-01.alpha.
Control plane operations
Control plane operations are used to manage indexes and their configuration.Create index (POST /indexes)
Create index (POST /indexes)
b1 shard and replica.name(string, required) - Unique index name (lowercase alphanumeric and hyphens)deployment(object, required) - Deployment configurationdeployment_type(string, required) - Must be"managed"for serverlesscloud(string, required) - Cloud provider:"aws"or"gcp"region(string, required) - Region code (e.g.,"us-east-1")
schema(object, required) - Schema definitionfields(object, required) - Map of field names to field definitions. Each field can be:- Text search field - Enables full-text search
type: "string"(required)full_text_searchable: true(required)language(string, optional) - Language for tokenization and, whenstemmingis enabled, for stemming (default:"en"). Accepts short codes or full names (e.g.,"fr"or"french"). See Language for the full list of supported languages.stemming(boolean, optional) - Whether to enable language-based stemming (default:false). See Stemming.
- Filterable field - Can be used in query filters
type(required) -"string","float", or"boolean"filterable: true(required)
- All fields support an optional
descriptionfor documenting what the field contains. This is especially useful for agentic workflows where an LLM inspects the schema to understand how to query the index.
- Text search field - Enables full-text search
read_capacity(object, required) - Read capacity configuration.mode(string, required) - Must be"Dedicated"for full-text search indexesdedicated(object, required) - Dedicated read node configurationnode_type(string, required) - Node type (e.g.,"b1")scaling(string, required) - Scaling mode:"Manual"manual(object, required) - Manual scaling configurationshards(integer, required) - Number of shards (minimum1)replicas(integer, required) - Number of replicas (minimum1)
deletion_protection(string, optional) -"enabled"or"disabled"(default:"disabled")tags(object, optional) - Key-value tags for the index
- String fields must declare
full_text_searchableorfilterable, but not both - Field names must be unique within the schema
- Field names must contain only alphanumeric characters and underscores
- The schema must contain at least one field
- Only one field can have
full_text_searchable: true(multiple text fields not yet supported)
id(string) - Unique index IDname(string) - Index namehost(string) - Index host URL for data plane operationsstatus(object) - Index statusready(boolean) - Whether the index is ready for operationsstate(string) - Current state:"Initializing","Ready", etc.
deployment(object) - Deployment configurationdeployment_type(string) - Deployment type (e.g.,"managed")cloud(string) - Cloud providerregion(string) - Region codeenvironment(string) - Environment identifier assigned by the system
schema(object) - Schema definitionversion(string) - Schema version (e.g.,"v1")fields(object) - Field definitions with server-applied defaults. Full-text searchable fields include additional properties:language,stemming(defaults tofalse; set totrueat creation to enable),lowercase,max_term_len. All fields includedescription(null if not set).
read_capacity(object) - Read capacity configurationmode(string) - Read capacity mode (e.g.,"Dedicated")dedicated(object) - Dedicated read node configurationnode_type(string) - Node type (e.g.,"b1")scaling(string) - Scaling mode (e.g.,"Manual")manual(object) - Manual scaling configurationshards(integer) - Number of shardsreplicas(integer) - Number of replicas
status(object) - Current status of read capacity provisioningstate(string) - Provisioning state (e.g.,"Migrating","Ready")current_shards(integer or null) - Current number of shardscurrent_replicas(integer or null) - Current number of replicas
tags(object or null) - Key-value tags, or null if none setdeletion_protection(string) - Deletion protection status
status.ready: true before performing data plane operations.List indexes (GET /indexes)
List indexes (GET /indexes)
Describe index (GET /indexes/{index_name})
Describe index (GET /indexes/{index_name})
index_name(string, required) - Name of the index
Update index (PATCH /indexes/{index_name})
Update index (PATCH /indexes/{index_name})
deletion_protection can be updated.Example requestindex_name(string, required) - Name of the index
deletion_protection(string, optional) -"enabled"or"disabled"
Delete index (DELETE /indexes/{index_name})
Delete index (DELETE /indexes/{index_name})
deletion_protection is enabled, you must first disable it using the update endpoint.Example requestindex_name(string, required) - Name of the index
Data plane operations
"__default__" if you don’t need partitioning.Upsert documents (POST /namespaces/{namespace}/documents/upsert)
Upsert documents (POST /namespaces/{namespace}/documents/upsert)
_id exists, it is completely replaced. Documents become searchable within approximately one minute.Example requestnamespace(string, required) - Namespace name (use"__default__"if not using namespaces)
documents(array, required) - Array of documents to upsert. Each document is an object with:_id(string, required) - Unique document ID. If a document with this_idalready exists, it is replaced entirely. If multiple documents in the same batch share an_id, only the last one is stored.- Fields matching your schema (the
full_text_searchablefield must be present)
upserted_count(integer) - Number of documents upserted
Schema validation
Each item in thedocuments array is validated against your index schema. If any item fails validation, the entire request fails and nothing is upserted.| Scenario | Result |
|---|---|
| Field value doesn’t match declared type | Error - request fails |
| Field not in schema | Stored and filterable, but not added to the schema |
| Schema field missing from item | OK - fields are optional |
| Text-searchable field is missing | Error - request fails |
| Text contains Unicode or special characters | OK - fully supported |
"Document with id 'doc-1': boolean field 'in_stock' must be a boolean""Each document must have at least one indexable field"
Search documents (POST /namespaces/{namespace}/documents/search)
Search documents (POST /namespaces/{namespace}/documents/search)
type: "text") or Lucene query syntax (type: "query_string"). Optionally filter by field values before scoring. Results are ranked by BM25 relevance score.Example requestnamespace(string, required) - Namespace name (use"__default__"if not using namespaces)
include_fields(array, required) - List of field names to return in resultsscore_by(array, required) - Array of scoring methods. Each item must be one of:type: "text"- Simple phrase matching on a specific field. Your query is matched as an exact phrase: all terms must appear adjacent and in the same order. Case-insensitive.field(string, required) - Name of the searchable field.query(string, required) - The phrase to search for. Multiple words are treated as a phrase, not individual terms. Query syntax operators (AND, OR, NOT, etc.) are not supported; they are treated as literal words.
type: "query_string"- Lucene query syntax. Supports boolean operators, phrase prefix matching, boosting, and more. Do not specifyfield; instead, target fields within the query itself.query(string, required) - A Lucene query string in the form<field_name>:(<query clause>)(see query syntax reference).
top_k(integer, required) - Number of results to return (1-10000)filter(object, optional) - Filter conditions on filterable fields (applied before search)
type: "text" and type: "query_string":- Use
type: "text"when you know the exact phrase to look for. It’s the simpler option — just provide the phrase and the field name. - Use
type: "query_string"when you need boolean operators, phrase prefix matching, boosting, or OR logic between terms. See the query syntax reference for the full list of operators.
| Operator | Example | Description |
|---|---|---|
$eq | {"category": {"$eq": "tech"}} | Equals |
$ne | {"category": {"$ne": "tech"}} | Not equals |
$gt | {"year": {"$gt": 2023}} | Greater than |
$gte | {"year": {"$gte": 2023}} | Greater than or equal |
$lt | {"year": {"$lt": 2025}} | Less than |
$lte | {"year": {"$lte": 2025}} | Less than or equal |
$in | {"category": {"$in": ["a", "b"]}} | In list |
$nin | {"category": {"$nin": ["a", "b"]}} | Not in list |
content field, filtered to technology articles from 2024 onward.Boolean query with query_string:content field contains both “machine” and “learning” (in any order, not necessarily adjacent).Boosting with query_string:^, AND, and OR only work with type: "query_string".OR query (default behavior) with query_string:type: "query_string", multiple terms default to OR: this matches documents containing “quick”, “brown”, or “fox” (or any combination). Documents matching more terms rank higher.Example responseStatus: 200 OKmatches(array) - Array of matching documentsid(string) - Document IDscore(float) - BM25 relevance score (higher is better){field}(any) - Requested fields from the document
namespace(string) - Namespace searchedusage(object) - Usage informationread_units(integer) - Read units consumed
Python SDK
Installation
To use full-text search with Pinecone’s Python SDK, you need to install theoakcylinder package — an early access version of the SDK that is in active development.
Control plane
Instantiate the client
Instantiate the client
Create index
Create index
Describe index
Describe index
Delete index
Delete index
Data plane
Build a data plane client
Build a data plane client
Upsert documents
Upsert documents
Search documents — simple phrase (type: text)
Search documents — simple phrase (type: text)
Search documents — Lucene query string (type: query_string)
Search documents — Lucene query string (type: query_string)
Query syntax reference
Full-text search supports two query types with different capabilities:| Feature | type: "text" | type: "query_string" |
|---|---|---|
| Purpose | Simple phrase matching | Lucene query syntax |
field parameter | Required | Not allowed (field names go in query) |
| Multi-word behavior | Phrase (adjacent, in order) | OR by default |
| Boolean operators | Not supported (treated as words) | AND, OR, NOT, +, - |
| Phrase prefix | Not supported | "phrase pre"* (last term as prefix) |
| Phrase matching | Automatic (entire query is a phrase) | Wrap in quotes: "exact phrase" |
| Phrase slop | Not supported | "phrase"~N |
| Boosting | Not supported | term^N |
| Stemming | Supported (when enabled) | Supported (when enabled) |
| Case sensitivity | Case-insensitive | Case-insensitive |
Simple phrase matching (type: "text")
With type: "text", your entire query is treated as a phrase. All terms must appear adjacent and in the same order in the document. Matching is case-insensitive.
| Query | Matches | Does not match |
|---|---|---|
machine learning | ”Machine learning is great" | "Machine and learning separately” (words not adjacent) |
machine learning | ”We use machine learning daily" | "Learning machine” (wrong order) |
machine | ”Machine learning is great" | "Vector databases only” |
- Single term (
machine): Matches any document containing the term. Case-insensitive. - Multiple terms (
machine learning): Matched as a phrase — all terms must appear adjacent and in order. This is not an OR query. - Tokenization: Text is split into tokens on whitespace and punctuation. This means punctuation between words does not prevent a phrase match: a document containing “state-of-the-art” is tokenized as
["state", "of", "the", "art"], and a phrase query forstate of the artmatches it because the tokens are adjacent and in the correct order. - No stop words: Common words like “the”, “a”, “of”, and “is” are not removed during indexing or search. All tokens are indexed and searchable. This means phrase queries are position-sensitive:
"state art"does not match “state-of-the-art” because “of” and “the” sit between “state” and “art”. To exclude specific words or require non-adjacent terms, usetype: "query_string"with operators likeNOT,-, orAND(e.g.,content:(state AND art)). - No operator support: Characters like
AND,OR,NOT,*,~,^,+,-, and quotes are treated as literal text, not operators. For example,machine AND learningsearches for the three-word phrase “machine and learning”.
type: "query_string" instead.Lucene query syntax (type: "query_string")
With type: "query_string", you write Lucene query syntax, with operator support. Field names are embedded in the query itself (e.g., content:(term)).
| Operator | Syntax | Example | Description |
|---|---|---|---|
| Term | field:(word) | content:(computers) | Match documents containing term |
| Multiple terms | field:(a b) | content:(machine learning) | OR by default — matches either term |
| Phrase | field:("words") | content:("machine learning") | Exact phrase match (adjacent, in order) |
| AND | AND | content:(a AND b) | Both terms required |
| OR | OR | content:(a OR b) | Either term matches (same as default) |
| NOT | NOT | content:(a NOT b) | Exclude second term |
| Required | +term | content:(+database search) | Term must be present |
| Excluded | -term | content:(database -deprecated) | Term must not be present |
| Grouping | (expr) | content:((a OR b) AND c) | Control precedence |
| Phrase slop | "phrase"~N | content:("fast search"~2) | Allow up to N words between phrase terms |
| Boost | term^N | content:(machine^3 learning) | Multiply term’s relevance score by N |
| Phrase prefix | "phrase pre"* | content:("james w"*) | Last term in phrase matched as prefix |
Terms and default OR behavior
Terms and default OR behavior
Phrases
Phrases
type: "text" does with query: "machine learning".Boolean operators (AND, OR, NOT)
Boolean operators (AND, OR, NOT)
AND, OR, and NOT for explicit boolean logic.Required and excluded terms (+, -)
Required and excluded terms (+, -)
+ to require a term and - to exclude a term.Phrase proximity (slop)
Phrase proximity (slop)
Term boosting
Term boosting
^N.Phrase prefix
Phrase prefix
* to a quoted phrase to treat the last term as a prefix. The phrase must contain at least two terms.auto* are not supported. The phrase must contain at least two terms.Combining operators
Combining operators
Stemming
Stemming reduces words to their root form so that morphological variants match each other. For example, with stemming enabled, a query for “run” also matches documents containing “running” or “runs”. Stemming is opt-in and disabled by default. To enable it, setstemming: true on a text search field when creating the index. The stemming algorithm is determined by the field’s language setting.
Example: enabling stemming with French
type: "text" and type: "query_string" queries on the field.
With stemming disabled (default):
| Query | Matches | Does not match |
|---|---|---|
run | ”run" | "running”, “runs”, “ran” |
machines | ”machines" | "machine” |
| Query | Matches | Does not match |
|---|---|---|
run | ”run”, “running”, “runs" | "ran” (irregular form) |
machines | ”machines”, “machine" | "database” |
Language
Thelanguage parameter controls tokenization and stemming behavior for a text search field. It determines how text is analyzed during indexing and search: how words are split into tokens and, when stemming is enabled, which language-specific rules are used to reduce words to their root forms.
The default language is "en" (English). You can specify a language using either its short code or full name (e.g., "fr" or "french"). See the Stemming section for an example using "language": "french".
Supported languages:
| Code | Full name |
|---|---|
ar | arabic |
da | danish |
de | german |
el | greek |
en | english |
es | spanish |
fi | finnish |
fr | french |
hu | hungarian |
it | italian |
nl | dutch |
no | norwegian |
pt | portuguese |
ro | romanian |
ru | russian |
sv | swedish |
ta | tamil |
tr | turkish |
Troubleshooting
Document not appearing in search results
Document not appearing in search results
- Check indexing latency: New documents may take up to 1 minute to become searchable.
- Verify the upsert response shows the expected
upserted_count. - Confirm you’re searching the same namespace where you upserted.
- With
type: "text", multi-word queries are phrase searches (adjacent, in order). Try a single-term query first to confirm the document is searchable. Terms are case-insensitive and must match exactly after tokenization unless stemming is enabled. - If using filters, ensure the document’s field values match your filter conditions.
Unexpected search results
Unexpected search results
type: "text"treats multi-word queries as phrases.machine learningonly matches documents with those words adjacent and in order. If you want documents containing “machine” OR “learning”, usetype: "query_string"withcontent:(machine learning).type: "query_string"defaults to OR.content:(machine learning)matches documents containing either term. UseANDor+for required terms.- Operators like
AND,OR,NOT,*,~, and^only work withtype: "query_string". Withtype: "text", they are treated as literal words. - Searches match exact terms (after tokenization and lowercasing) unless stemming is enabled on the field. With stemming, morphological variants of a word will match.
- If using
^Nboosting (inquery_stringmode), higher-boosted terms significantly affect ranking.
Query syntax errors
Query syntax errors
type: "query_string". With type: "text", any input is valid since it’s treated as a literal phrase.- Unmatched quotes (
"machine learning): Close all quotes. - Empty query: Provide at least one search term.
- Invalid boolean syntax (
AND machine): Operators need terms on both sides. - Unbalanced parentheses: Match all opening and closing parens.
- Unknown field name: Field names in the query must match
full_text_searchablefields in the schema.
API errors
API errors
401 Unauthorized: Check theApi-Keyheader.400 Bad Request: Check JSON syntax and required fields.404 Not Found: Verify the index name and host URL.- Missing API version: Add
X-Pinecone-API-Version: 2026-01.alpha.
Upsert errors
Upsert errors
- Type mismatch: Ensure values match declared schema types.
- Missing text content: The text-searchable field must be present in the document.
- Invalid
_id: Every document must have a non-empty_idstring.
Slow search performance
Slow search performance
- Reduce query complexity: Boolean operators are more expensive than simple term queries.
- Simplify filters: Filters are applied before search, so broad filters increase the search space.
- Check document count and size: Larger datasets may have higher latency.
Early access
Full-text search is in early access under API version2026-01.alpha. The feature is functional and ready for evaluation, but APIs may evolve based on feedback before general availability.
Requirements & limitations
- All requests require
X-Pinecone-API-Version: 2026-01.alpha - REST API & Python SDK only
- FTS requires a dedicated index created with the
2026-01.alphaAPI - During early access, full-text search indexes must be created using dedicated read nodes (
read_capacity.mode: "Dedicated"), using a singleb1node - Max document size: ~500 KB
- Insert-to-searchable latency: < 1 minute
- One text-searchable field per index
- No document fetch or delete endpoints yet
- No partial updates (upsert replaces the entire document)
- Text search only in this API version (
2026-01.alpha) - Hybrid search and text pre-filtering not yet available
- Indexes cannot be created in CMEK-enabled projects
- Backup and restore not supported
- Fuzzy matching and regex search not yet supported
- Single-term prefix wildcards (
auto*) not supported; use phrase prefix ("word auto"*) instead