Skip to main content
Full-text search is in early access. APIs may change, and some features are not yet available.
Full-text search (FTS) enables keyword-based search over text documents in Pinecone. It offers two query types: simple phrase matching (type: "text") for matching exact phrases on a field, and Lucene query syntax (type: "query_string") supporting boolean operators, phrase prefix matching, boosting, and more. Results are ranked by relevance using the BM25 algorithm.

Schema definition

Full-text search indexes require an explicit schema that declares each field’s type and behavior: which fields are searchable, which are filterable, and what data types to expect. When you create the index, the schema tells Pinecone:
  • Which fields contain text you want to search (full_text_searchable: true)
  • Which fields can be used as filters (filterable: true)
  • What data types to expect (for validation)
In an upsert, fields not defined in the schema are stored with the record and are filterable, but they will not appear in the schema.
Here’s an example schema with a searchable content field and a filterable category field:
{
  "name": "articles",
  "deployment": { 
    "deployment_type": "managed", 
    "cloud": "aws", 
    "region": "us-east-1" 
  },
  "schema": {
    "fields": {
      "content": { 
        "type": "string", 
        "full_text_searchable": true,
        "description": "The main body text of the article"
      },
      "category": { 
        "type": "string", 
        "filterable": true,
        "description": "The article's topic category"
      }
    }
  },
  // ...
}
Field types:
  • string - Text data. Can be full_text_searchable (matched by keyword queries) or filterable (narrowed by exact-value conditions like $eq, $in).
  • float - Numeric data. Can be filterable.
  • boolean - True/false values. Can be filterable.
Searchable and filterable fields serve different roles: searchable fields determine which documents match your query and how they’re ranked, while filterable fields narrow the result set before the search runs. Use them together. For example, you can search a content field for “machine learning” while filtering on category = "technology".
Schema migration is not yet supported. Once an index is created, you cannot add, remove, or modify fields. Plan your schema carefully.

API

Full-text search uses API version 2026-01.alpha. All requests require the header X-Pinecone-API-Version: 2026-01.alpha.

Control plane operations

Control plane operations are used to manage indexes and their configuration.
Creates a new text search index with the specified schema. The index initializes asynchronously; poll the describe endpoint to know when it’s ready for data operations.
During early access, full-text search indexes must be created with dedicated read nodes, using a single b1 shard and replica.
After creation, wait for both status.ready: true and read_capacity.status.state: "Ready" before searching. The index status may show ready before the dedicated read nodes finish provisioning. Searches made while read_capacity.status.state is still "Migrating" will return empty results.
Example request
curl -X POST "https://api.pinecone.io/indexes" \
  -H "Api-Key: {{YOUR_API_KEY}}" \
  -H "Content-Type: application/json" \
  -H "X-Pinecone-API-Version: 2026-01.alpha" \
  -d '{
    "name": "articles",
    "deployment": {
      "deployment_type": "managed",
      "cloud": "aws",
      "region": "us-east-1"
    },
    "schema": {
      "fields": {
        "content": {
          "type": "string",
          "full_text_searchable": true,
          "language": "en"
        },
        "category": { "type": "string", "filterable": true },
        "year": { "type": "float", "filterable": true },
        "rating": { "type": "float", "filterable": true },
        "published": { "type": "boolean", "filterable": true }
      }
    },
    "read_capacity": {
      "mode": "Dedicated",
      "dedicated": {
        "node_type": "b1",
        "scaling": "Manual",
        "manual": { "shards": 1, "replicas": 1 }
      }
    },
    "deletion_protection": "disabled"
  }'
Request parameters:
  • name (string, required) - Unique index name (lowercase alphanumeric and hyphens)
  • deployment (object, required) - Deployment configuration
    • deployment_type (string, required) - Must be "managed" for serverless
    • cloud (string, required) - Cloud provider: "aws" or "gcp"
    • region (string, required) - Region code (e.g., "us-east-1")
  • schema (object, required) - Schema definition
    • fields (object, required) - Map of field names to field definitions. Each field can be:
      • Text search field - Enables full-text search
        • type: "string" (required)
        • full_text_searchable: true (required)
        • language (string, optional) - Language for tokenization and, when stemming is enabled, for stemming (default: "en"). Accepts short codes or full names (e.g., "fr" or "french"). See Language for the full list of supported languages.
        • stemming (boolean, optional) - Whether to enable language-based stemming (default: false). See Stemming.
      • Filterable field - Can be used in query filters
        • type (required) - "string", "float", or "boolean"
        • filterable: true (required)
      • All fields support an optional description for documenting what the field contains. This is especially useful for agentic workflows where an LLM inspects the schema to understand how to query the index.
  • read_capacity (object, required) - Read capacity configuration.
    • mode (string, required) - Must be "Dedicated" for full-text search indexes
    • dedicated (object, required) - Dedicated read node configuration
      • node_type (string, required) - Node type (e.g., "b1")
      • scaling (string, required) - Scaling mode: "Manual"
      • manual (object, required) - Manual scaling configuration
        • shards (integer, required) - Number of shards (minimum 1)
        • replicas (integer, required) - Number of replicas (minimum 1)
  • deletion_protection (string, optional) - "enabled" or "disabled" (default: "disabled")
  • tags (object, optional) - Key-value tags for the index
Schema constraints:
  • String fields must declare full_text_searchable or filterable, but not both
  • Field names must be unique within the schema
  • Field names must contain only alphanumeric characters and underscores
  • The schema must contain at least one field
  • Only one field can have full_text_searchable: true (multiple text fields not yet supported)
Example responseStatus: 202 Accepted
{
  "id": "e51ea4e1-2dda-4607-94dc-9054b1fa8492",
  "name": "articles",
  "host": "articles-jweaq8m.svc.aped-4627-b74a.pinecone.io",
  "status": {
    "ready": false,
    "state": "Initializing"
  },
  "deployment": {
    "deployment_type": "managed",
    "cloud": "aws",
    "region": "us-east-1",
    "environment": "aped-4627-b74a"
  },
  "schema": {
    "version": "v1",
    "fields": {
      "content": {
        "type": "string",
        "description": null,
        "full_text_searchable": true,
        "language": "en",
        "stemming": false,
        "lowercase": true,
        "max_term_len": 40
      },
      "category": { "type": "string", "description": null, "filterable": true },
      "year": { "type": "float", "description": null, "filterable": true },
      "rating": { "type": "float", "description": null, "filterable": true },
      "published": { "type": "boolean", "description": null, "filterable": true }
    }
  },
  "read_capacity": {
    "mode": "Dedicated",
    "dedicated": {
      "node_type": "b1",
      "scaling": "Manual",
      "manual": { "shards": 1, "replicas": 1 }
    },
    "status": {
      "state": "Migrating",
      "current_shards": null,
      "current_replicas": null
    }
  },
  "tags": null,
  "deletion_protection": "disabled"
}
Response fields:
  • id (string) - Unique index ID
  • name (string) - Index name
  • host (string) - Index host URL for data plane operations
  • status (object) - Index status
    • ready (boolean) - Whether the index is ready for operations
    • state (string) - Current state: "Initializing", "Ready", etc.
  • deployment (object) - Deployment configuration
    • deployment_type (string) - Deployment type (e.g., "managed")
    • cloud (string) - Cloud provider
    • region (string) - Region code
    • environment (string) - Environment identifier assigned by the system
  • schema (object) - Schema definition
    • version (string) - Schema version (e.g., "v1")
    • fields (object) - Field definitions with server-applied defaults. Full-text searchable fields include additional properties: language, stemming (defaults to false; set to true at creation to enable), lowercase, max_term_len. All fields include description (null if not set).
  • read_capacity (object) - Read capacity configuration
    • mode (string) - Read capacity mode (e.g., "Dedicated")
    • dedicated (object) - Dedicated read node configuration
      • node_type (string) - Node type (e.g., "b1")
      • scaling (string) - Scaling mode (e.g., "Manual")
      • manual (object) - Manual scaling configuration
        • shards (integer) - Number of shards
        • replicas (integer) - Number of replicas
    • status (object) - Current status of read capacity provisioning
      • state (string) - Provisioning state (e.g., "Migrating", "Ready")
      • current_shards (integer or null) - Current number of shards
      • current_replicas (integer or null) - Current number of replicas
  • tags (object or null) - Key-value tags, or null if none set
  • deletion_protection (string) - Deletion protection status
Wait for status.ready: true before performing data plane operations.
Returns all indexes in the project, including their current status and configuration.Example request
curl -X GET "https://api.pinecone.io/indexes" \
  -H "Api-Key: {{YOUR_API_KEY}}" \
  -H "X-Pinecone-API-Version: 2026-01.alpha"
Example responseStatus: 200 OK
{
  "indexes": [
    {
      "id": "e51ea4e1-2dda-4607-94dc-9054b1fa8492",
      "name": "articles",
      "host": "articles-jweaq8m.svc.aped-4627-b74a.pinecone.io",
      "status": {
        "ready": true,
        "state": "Ready"
      },
      "deployment": {
        "deployment_type": "managed",
        "region": "us-east-1",
        "cloud": "aws",
        "environment": "aped-4627-b74a"
      },
      "read_capacity": {
        "mode": "Dedicated",
        "dedicated": {
          "node_type": "b1",
          "scaling": "Manual",
          "manual": {
            "shards": 1,
            "replicas": 1
          }
        },
        "status": {
          "state": "Ready",
          "current_shards": 1,
          "current_replicas": 1
        }
      },
      "schema": {
        "version": "v1",
        "fields": {
          "published": {
            "type": "boolean",
            "description": null,
            "filterable": true
          },
          "content": {
            "type": "string",
            "description": null,
            "full_text_searchable": true,
            "language": "en",
            "stemming": false,
            "lowercase": true,
            "max_term_len": 40
          },
          "year": {
            "type": "float",
            "description": null,
            "filterable": true
          },
          "rating": {
            "type": "float",
            "description": null,
            "filterable": true
          },
          "category": {
            "type": "string",
            "description": null,
            "filterable": true
          }
        }
      },
      "tags": null,
      "deletion_protection": "disabled"
    },
    // More indexes...
  ]
}
Returns an array of index objects, each with the same structure as the create index response (see above).
Returns detailed information about a specific index, including its schema, status, and host URL.Example request
curl -X GET "https://api.pinecone.io/indexes/articles" \
  -H "Api-Key: {{YOUR_API_KEY}}" \
  -H "X-Pinecone-API-Version: 2026-01.alpha"
Path parameters:
  • index_name (string, required) - Name of the index
Example responseStatus: 200 OKReturns the same structure as the create index response.
Updates index configuration. Currently, only deletion_protection can be updated.Example request
curl -X PATCH "https://api.pinecone.io/indexes/articles" \
  -H "Api-Key: {{YOUR_API_KEY}}" \
  -H "Content-Type: application/json" \
  -H "X-Pinecone-API-Version: 2026-01.alpha" \
  -d '{
    "deletion_protection": "enabled"
  }'
Path parameters:
  • index_name (string, required) - Name of the index
Body parameters:
  • deletion_protection (string, optional) - "enabled" or "disabled"
Example responseStatus: 200 OKReturns the updated index configuration (same structure as create index response).
Permanently deletes an index and all its data. This action cannot be undone. If deletion_protection is enabled, you must first disable it using the update endpoint.Example request
curl -X DELETE "https://api.pinecone.io/indexes/articles" \
  -H "Api-Key: {{YOUR_API_KEY}}" \
  -H "X-Pinecone-API-Version: 2026-01.alpha"
Path parameters:
  • index_name (string, required) - Name of the index
Example responseStatus: 202 Accepted (empty body)

Data plane operations

Data plane operations include a namespace in the URL path. Namespaces partition documents within an index: they’re auto-created on first upsert and completely isolated from each other. Use "__default__" if you don’t need partitioning.
Inserts or updates documents. If a document with the same _id exists, it is completely replaced. Documents become searchable within approximately one minute.Example request
curl -X POST "https://articles-abc123.svc.us-east-1.pinecone.io/namespaces/__default__/documents/upsert" \
  -H "Api-Key: {{YOUR_API_KEY}}" \
  -H "Content-Type: application/json" \
  -H "X-Pinecone-API-Version: 2026-01.alpha" \
  -d '{
    "documents": [
      {
        "_id": "doc1",
        "content": "Machine learning models are revolutionizing natural language processing",
        "category": "technology",
        "year": 2024
      },
      {
        "_id": "doc2",
        "content": "Vector databases enable fast similarity search across embeddings",
        "category": "technology",
        "year": 2023
      },
      {
        "_id": "doc3",
        "content": "Quantum computers leverage superposition for faster computation",
        "category": "science",
        "year": 2024
      }
    ]
  }'
Path parameters:
  • namespace (string, required) - Namespace name (use "__default__" if not using namespaces)
Body parameters:
  • documents (array, required) - Array of documents to upsert. Each document is an object with:
    • _id (string, required) - Unique document ID. If a document with this _id already exists, it is replaced entirely. If multiple documents in the same batch share an _id, only the last one is stored.
    • Fields matching your schema (the full_text_searchable field must be present)
Example responseStatus: 200 OK
{
  "upserted_count": 3
}
Response fields:
  • upserted_count (integer) - Number of documents upserted

Schema validation

Each item in the documents array is validated against your index schema. If any item fails validation, the entire request fails and nothing is upserted.
ScenarioResult
Field value doesn’t match declared typeError - request fails
Field not in schemaStored and filterable, but not added to the schema
Schema field missing from itemOK - fields are optional
Text-searchable field is missingError - request fails
Text contains Unicode or special charactersOK - fully supported
Example errors:
  • "Document with id 'doc-1': boolean field 'in_stock' must be a boolean"
  • "Each document must have at least one indexable field"
Searches documents using simple phrase matching (type: "text") or Lucene query syntax (type: "query_string"). Optionally filter by field values before scoring. Results are ranked by BM25 relevance score.Example request
curl -X POST "https://articles-abc123.svc.us-east-1.pinecone.io/namespaces/__default__/documents/search" \
  -H "Api-Key: {{YOUR_API_KEY}}" \
  -H "Content-Type: application/json" \
  -H "X-Pinecone-API-Version: 2026-01.alpha" \
  -d '{
    "include_fields": ["content", "category", "year"],
    "score_by": [{
      "type": "text",
      "field": "content",
      "query": "machine learning"
    }],
    "top_k": 10
  }'
Path parameters:
  • namespace (string, required) - Namespace name (use "__default__" if not using namespaces)
Body parameters:
  • include_fields (array, required) - List of field names to return in results
  • score_by (array, required) - Array of scoring methods. Each item must be one of:
    • type: "text" - Simple phrase matching on a specific field. Your query is matched as an exact phrase: all terms must appear adjacent and in the same order. Case-insensitive.
      • field (string, required) - Name of the searchable field.
      • query (string, required) - The phrase to search for. Multiple words are treated as a phrase, not individual terms. Query syntax operators (AND, OR, NOT, etc.) are not supported; they are treated as literal words.
    • type: "query_string" - Lucene query syntax. Supports boolean operators, phrase prefix matching, boosting, and more. Do not specify field; instead, target fields within the query itself.
      • query (string, required) - A Lucene query string in the form <field_name>:(<query clause>) (see query syntax reference).
  • top_k (integer, required) - Number of results to return (1-10000)
  • filter (object, optional) - Filter conditions on filterable fields (applied before search)
Choosing between type: "text" and type: "query_string":
  • Use type: "text" when you know the exact phrase to look for. It’s the simpler option — just provide the phrase and the field name.
  • Use type: "query_string" when you need boolean operators, phrase prefix matching, boosting, or OR logic between terms. See the query syntax reference for the full list of operators.
Filter operators:Filters are applied before the search runs. The search only considers documents that match the filter.
OperatorExampleDescription
$eq{"category": {"$eq": "tech"}}Equals
$ne{"category": {"$ne": "tech"}}Not equals
$gt{"year": {"$gt": 2023}}Greater than
$gte{"year": {"$gte": 2023}}Greater than or equal
$lt{"year": {"$lt": 2025}}Less than
$lte{"year": {"$lte": 2025}}Less than or equal
$in{"category": {"$in": ["a", "b"]}}In list
$nin{"category": {"$nin": ["a", "b"]}}Not in list
More examples:Simple phrase matching with filter:
curl -X POST "https://articles-abc123.svc.us-east-1.pinecone.io/namespaces/__default__/documents/search" \
  -H "Api-Key: {{YOUR_API_KEY}}" \
  -H "Content-Type: application/json" \
  -H "X-Pinecone-API-Version: 2026-01.alpha" \
  -d '{
    "include_fields": ["content", "category", "year"],
    "filter": {
      "category": { "$eq": "technology" },
      "year": { "$gte": 2024 }
    },
    "score_by": [{
      "type": "text",
      "field": "content",
      "query": "machine learning"
    }],
    "top_k": 10
  }'
This matches documents containing the exact phrase “machine learning” (adjacent, in order) within the content field, filtered to technology articles from 2024 onward.Boolean query with query_string:
curl -X POST "https://articles-abc123.svc.us-east-1.pinecone.io/namespaces/__default__/documents/search" \
  -H "Api-Key: {{YOUR_API_KEY}}" \
  -H "Content-Type: application/json" \
  -H "X-Pinecone-API-Version: 2026-01.alpha" \
  -d '{
    "include_fields": ["content", "category"],
    "score_by": [{
      "type": "query_string",
      "query": "content:(machine AND learning)"
    }],
    "top_k": 10
  }'
This matches documents where the content field contains both “machine” and “learning” (in any order, not necessarily adjacent).Boosting with query_string:
curl -X POST "https://articles-abc123.svc.us-east-1.pinecone.io/namespaces/__default__/documents/search" \
  -H "Api-Key: {{YOUR_API_KEY}}" \
  -H "Content-Type: application/json" \
  -H "X-Pinecone-API-Version: 2026-01.alpha" \
  -d '{
    "include_fields": ["content", "category"],
    "score_by": [{
      "type": "query_string",
      "query": "content:(\"natural language processing\"^2 machine learning)"
    }],
    "top_k": 10
  }'
This boosts the phrase “natural language processing” by 2x and also matches “machine” or “learning” (with default OR). Note that operators like ^, AND, and OR only work with type: "query_string".OR query (default behavior) with query_string:
curl -X POST "https://articles-abc123.svc.us-east-1.pinecone.io/namespaces/__default__/documents/search" \
  -H "Api-Key: {{YOUR_API_KEY}}" \
  -H "Content-Type: application/json" \
  -H "X-Pinecone-API-Version: 2026-01.alpha" \
  -d '{
    "include_fields": ["content", "category"],
    "score_by": [{
      "type": "query_string",
      "query": "content:(quick brown fox)"
    }],
    "top_k": 10
  }'
With type: "query_string", multiple terms default to OR: this matches documents containing “quick”, “brown”, or “fox” (or any combination). Documents matching more terms rank higher.Example responseStatus: 200 OK
{
  "matches": [
    {
      "id": "doc1",
      "score": 0.8234,
      "content": "Machine learning models are revolutionizing natural language processing",
      "category": "technology",
      "year": 2024
    }
  ],
  "namespace": "__default__",
  "usage": { "read_units": 1 }
}
Response fields:
  • matches (array) - Array of matching documents
    • id (string) - Document ID
    • score (float) - BM25 relevance score (higher is better)
    • {field} (any) - Requested fields from the document
  • namespace (string) - Namespace searched
  • usage (object) - Usage information
    • read_units (integer) - Read units consumed

Python SDK

SDK support for full-text search is currently a work in progress. The API and SDK interfaces may change before general availability. The examples below show current SDK usage for the operations described in the API section. For requirements and limitations, see Early access.
For a runnable end-to-end example, see this Google Colab notebook, which demonstrates upserting and searching a sample Wikipedia dataset.

Installation

To use full-text search with Pinecone’s Python SDK, you need to install the oakcylinder package — an early access version of the SDK that is in active development.
  • The SDK’s API may change before these features are merged into the main pinecone package.
  • Do not install oakcylinder alongside pinecone in the same Python environment, as namespace conflicts will cause unpredictable behavior.
pip install oakcylinder

Control plane

import os
from pinecone import Pinecone

pc = Pinecone(
  api_key=os.environ.get('PINECONE_API_KEY')
)
Create an index with a document schema and read capacity (required for early access to full-text search).
from pinecone import SchemaBuilder

# The schema builder is an optional util to help with constructing 
# your schema dict in the correct shape
schema = (
    SchemaBuilder()
      .add_string_field(
        name="content", 
        full_text_searchable=True, 
        language="en"
      )
      .add_string_field(name="category", filterable=True)
      .add_integer_field(name="year", filterable=True)
      .build()
)

index_model = pc.indexes.create(
    name="articles",
    schema=schema,
    read_capacity={
        "mode": "Dedicated",
        "dedicated": {
            "node_type": "b1",
            "scaling": "Manual",
            "manual": {
              "shards": 1, 
              "replicas": 1
            },
        },
    },
)

# Use the host from the response for data plane operations
host = index_model.host
index_model = pc.indexes.describe(name="articles")
print(index_model.status, index_model.schema)
pc.indexes.delete(name="articles")

Data plane

Build a data plane client from the index host (or by name):
index = pc.index(host=index_model.host)
NAMESPACE = 'example-namespace'

docs = [
    {"_id": "doc1", "content": "Machine learning models are revolutionizing natural language processing", "category": "technology", "year": 2024},
    {"_id": "doc2", "content": "Vector databases enable fast similarity search across embeddings", "category": "technology", "year": 2023},
    {"_id": "doc3", "content": "Quantum computers leverage superposition for faster computation", "category": "science", "year": 2024},
    # ... more documents
]

index.documents.batch_upsert(
    namespace=NAMESPACE,
    documents=docs,
    batch_size=50,
    max_workers=4,
    show_progress=True,
)
NAMESPACE = 'example-namespace'

response = index.documents.search(
    namespace=NAMESPACE,
    top_k=10,
    score_by=[{"type": "text", "field": "content", "query": "machine learning"}],
    include_fields=["content", "category", "year"],
)
for match in response.matches:
    print(match.id, match.score, getattr(match, "content", ""))
NAMESPACE = 'example-namespace'

response = index.documents.search(
    namespace=NAMESPACE,
    top_k=10,
    score_by=[{"type": "query_string", "query": "content:(machine AND learning)"}],
    include_fields=["content", "category", "year"],
)

Query syntax reference

Full-text search supports two query types with different capabilities:
Featuretype: "text"type: "query_string"
PurposeSimple phrase matchingLucene query syntax
field parameterRequiredNot allowed (field names go in query)
Multi-word behaviorPhrase (adjacent, in order)OR by default
Boolean operatorsNot supported (treated as words)AND, OR, NOT, +, -
Phrase prefixNot supported"phrase pre"* (last term as prefix)
Phrase matchingAutomatic (entire query is a phrase)Wrap in quotes: "exact phrase"
Phrase slopNot supported"phrase"~N
BoostingNot supportedterm^N
StemmingSupported (when enabled)Supported (when enabled)
Case sensitivityCase-insensitiveCase-insensitive

Simple phrase matching (type: "text")

With type: "text", your entire query is treated as a phrase. All terms must appear adjacent and in the same order in the document. Matching is case-insensitive.
QueryMatchesDoes not match
machine learningMachine learning is great""Machine and learning separately” (words not adjacent)
machine learning”We use machine learning daily""Learning machine” (wrong order)
machineMachine learning is great""Vector databases only”
Key behaviors:
  • Single term (machine): Matches any document containing the term. Case-insensitive.
  • Multiple terms (machine learning): Matched as a phrase — all terms must appear adjacent and in order. This is not an OR query.
  • Tokenization: Text is split into tokens on whitespace and punctuation. This means punctuation between words does not prevent a phrase match: a document containing “state-of-the-art” is tokenized as ["state", "of", "the", "art"], and a phrase query for state of the art matches it because the tokens are adjacent and in the correct order.
  • No stop words: Common words like “the”, “a”, “of”, and “is” are not removed during indexing or search. All tokens are indexed and searchable. This means phrase queries are position-sensitive: "state art" does not match “state-of-the-art” because “of” and “the” sit between “state” and “art”. To exclude specific words or require non-adjacent terms, use type: "query_string" with operators like NOT, -, or AND (e.g., content:(state AND art)).
  • No operator support: Characters like AND, OR, NOT, *, ~, ^, +, -, and quotes are treated as literal text, not operators. For example, machine AND learning searches for the three-word phrase “machine and learning”.
If you need boolean logic, phrase prefix matching, boosting, or any other query operators, use type: "query_string" instead.

Lucene query syntax (type: "query_string")

With type: "query_string", you write Lucene query syntax, with operator support. Field names are embedded in the query itself (e.g., content:(term)).
OperatorSyntaxExampleDescription
Termfield:(word)content:(computers)Match documents containing term
Multiple termsfield:(a b)content:(machine learning)OR by default — matches either term
Phrasefield:("words")content:("machine learning")Exact phrase match (adjacent, in order)
ANDANDcontent:(a AND b)Both terms required
ORORcontent:(a OR b)Either term matches (same as default)
NOTNOTcontent:(a NOT b)Exclude second term
Required+termcontent:(+database search)Term must be present
Excluded-termcontent:(database -deprecated)Term must not be present
Grouping(expr)content:((a OR b) AND c)Control precedence
Phrase slop"phrase"~Ncontent:("fast search"~2)Allow up to N words between phrase terms
Boostterm^Ncontent:(machine^3 learning)Multiply term’s relevance score by N
Phrase prefix"phrase pre"*content:("james w"*)Last term in phrase matched as prefix
A term is a single word. Multiple space-separated terms use OR logic by default.
content:(machine learning)
Matches documents containing “machine” OR “learning” (or both). Documents with both terms rank higher.
Wrap multiple words in quotes to match them as an exact sequence.
content:("machine learning")
Matches only documents containing the exact phrase “machine learning” with the words adjacent. This is equivalent to what type: "text" does with query: "machine learning".
Use AND, OR, and NOT for explicit boolean logic.
content:(machine AND learning)        # Both terms required (any order)
content:(machine OR learning)         # Either term (same as default)
content:(machine NOT learning)        # "machine" but not "learning"
Precedence: AND binds tighter than OR. Use parentheses to control order:
content:((database OR storage) AND distributed)
Use + to require a term and - to exclude a term.
content:(+database distributed)       # MUST contain "database", "distributed" optional
content:(database -deprecated)        # Contains "database", must NOT contain "deprecated"
content:(+vector +search -legacy)     # MUST have "vector" AND "search", must NOT have "legacy"
This is useful when you want some terms to be mandatory filters while others boost relevance.
Allow words in a phrase to appear within N positions of each other.
content:("machine learning"~3)
Matches “machine learning”, “machine deep learning”, or “machine-assisted learning” (words within 3 positions).
Increase the importance of specific terms in ranking using ^N.
content:(machine^3 learning)          # "machine" weighted 3x more than "learning"
content:("neural network"^2 deep)     # Phrase boosted 2x
Documents with boosted terms rank higher when those terms appear.
Append * to a quoted phrase to treat the last term as a prefix. The phrase must contain at least two terms.
content:("james w"*)                  # Matches "james webb", "james watson", "james wilde"
content:("machine lea"*)              # Matches "machine learning", "machine learns"
All terms before the last must match exactly and adjacently; only the final term is treated as a prefix. This is useful for autocomplete or search-as-you-type scenarios.
Single-term prefix wildcards like auto* are not supported. The phrase must contain at least two terms.
Operators can be combined for complex queries:
content:(+database (distributed OR replicated) -deprecated)
Requires “database”, boosts results containing “distributed” or “replicated”, excludes “deprecated”.
content:("machine learning"^2 AND (tensorflow OR pytorch) -keras)
Boost exact phrase “machine learning”, require a framework, exclude keras.

Stemming

Stemming reduces words to their root form so that morphological variants match each other. For example, with stemming enabled, a query for “run” also matches documents containing “running” or “runs”. Stemming is opt-in and disabled by default. To enable it, set stemming: true on a text search field when creating the index. The stemming algorithm is determined by the field’s language setting. Example: enabling stemming with French
{
  "schema": {
    "fields": {
      "content": {
        "type": "string",
        "full_text_searchable": true,
        "stemming": true,
        "language": "french"
      }
    }
  }
}
With this configuration, French stemming rules are applied during both indexing and search. A query for “manger” would match documents containing “mangeons”, “mangé”, or “mangeait”. Stemming applies to both type: "text" and type: "query_string" queries on the field. With stemming disabled (default):
QueryMatchesDoes not match
run”run""running”, “runs”, “ran”
machines”machines""machine”
With stemming enabled:
QueryMatchesDoes not match
run”run”, “running”, “runs""ran” (irregular form)
machines”machines”, “machine""database”
Stemming uses algorithmic suffix analysis, so irregular forms (e.g., “ran” for “run”) may not match. Only regular morphological variants (e.g., “running”, “runs”) are reliably stemmed.
Stemming is set at index creation and cannot be changed afterward. If you need to enable or disable stemming, you must create a new index.

Language

The language parameter controls tokenization and stemming behavior for a text search field. It determines how text is analyzed during indexing and search: how words are split into tokens and, when stemming is enabled, which language-specific rules are used to reduce words to their root forms. The default language is "en" (English). You can specify a language using either its short code or full name (e.g., "fr" or "french"). See the Stemming section for an example using "language": "french". Supported languages:
CodeFull name
ararabic
dadanish
degerman
elgreek
enenglish
esspanish
fifinnish
frfrench
huhungarian
ititalian
nldutch
nonorwegian
ptportuguese
roromanian
rurussian
svswedish
tatamil
trturkish
Language is set at index creation and cannot be changed afterward.

Troubleshooting

  • Check indexing latency: New documents may take up to 1 minute to become searchable.
  • Verify the upsert response shows the expected upserted_count.
  • Confirm you’re searching the same namespace where you upserted.
  • With type: "text", multi-word queries are phrase searches (adjacent, in order). Try a single-term query first to confirm the document is searchable. Terms are case-insensitive and must match exactly after tokenization unless stemming is enabled.
  • If using filters, ensure the document’s field values match your filter conditions.
  • type: "text" treats multi-word queries as phrases. machine learning only matches documents with those words adjacent and in order. If you want documents containing “machine” OR “learning”, use type: "query_string" with content:(machine learning).
  • type: "query_string" defaults to OR. content:(machine learning) matches documents containing either term. Use AND or + for required terms.
  • Operators like AND, OR, NOT, *, ~, and ^ only work with type: "query_string". With type: "text", they are treated as literal words.
  • Searches match exact terms (after tokenization and lowercasing) unless stemming is enabled on the field. With stemming, morphological variants of a word will match.
  • If using ^N boosting (in query_string mode), higher-boosted terms significantly affect ranking.
Query syntax errors only apply to type: "query_string". With type: "text", any input is valid since it’s treated as a literal phrase.
  • Unmatched quotes ("machine learning): Close all quotes.
  • Empty query: Provide at least one search term.
  • Invalid boolean syntax (AND machine): Operators need terms on both sides.
  • Unbalanced parentheses: Match all opening and closing parens.
  • Unknown field name: Field names in the query must match full_text_searchable fields in the schema.
  • 401 Unauthorized: Check the Api-Key header.
  • 400 Bad Request: Check JSON syntax and required fields.
  • 404 Not Found: Verify the index name and host URL.
  • Missing API version: Add X-Pinecone-API-Version: 2026-01.alpha.
  • Type mismatch: Ensure values match declared schema types.
  • Missing text content: The text-searchable field must be present in the document.
  • Invalid _id: Every document must have a non-empty _id string.
  • Reduce query complexity: Boolean operators are more expensive than simple term queries.
  • Simplify filters: Filters are applied before search, so broad filters increase the search space.
  • Check document count and size: Larger datasets may have higher latency.

Early access

Full-text search is in early access under API version 2026-01.alpha. The feature is functional and ready for evaluation, but APIs may evolve based on feedback before general availability. Requirements & limitations
  • All requests require X-Pinecone-API-Version: 2026-01.alpha
  • REST API & Python SDK only
  • FTS requires a dedicated index created with the 2026-01.alpha API
  • During early access, full-text search indexes must be created using dedicated read nodes (read_capacity.mode: "Dedicated"), using a single b1 node
  • Max document size: ~500 KB
  • Insert-to-searchable latency: < 1 minute
  • One text-searchable field per index
  • No document fetch or delete endpoints yet
  • No partial updates (upsert replaces the entire document)
  • Text search only in this API version (2026-01.alpha)
  • Hybrid search and text pre-filtering not yet available
  • Indexes cannot be created in CMEK-enabled projects
  • Backup and restore not supported
  • Fuzzy matching and regex search not yet supported
  • Single-term prefix wildcards (auto*) not supported; use phrase prefix ("word auto"*) instead
Using text and vector search together Until hybrid search is available, you can use both by maintaining separate indexes—create an FTS index for keyword search, keep your vector index for semantic search, and merge results in your application.

Pricing

Pricing will be announced before general availability.