null

Pinecone quick reference for agents

Official docs: https://docs.pinecone.io/ - For complete API reference, advanced features, and detailed guides.

This guide covers critical gotchas, best practices, and common patterns specific to this project. For anything not covered here, consult the official Pinecone documentation.

⚠️ Critical: Installation & SDK

ALWAYS use the current SDK:

pip install pinecone          # ✅ Correct (current SDK)
pip install pinecone-client   # ❌ WRONG (deprecated, old API)

Current API (2025):

from pinecone import Pinecone  # ✅ Correct import

🚫 CRITICAL: CLI for Admin, SDK for Data

ALWAYS use CLI for administrative tasks:

❌ NEVER call pc.create_index(), pc.delete_index(), pc.configure_index() in code
✅ ALWAYS use pc index create, pc index delete, pc index configure commands
Reason: Admin operations are one-time setup tasks, not application logic

ONLY use SDK in your application code for:

Data operations: upsert, query, search, fetch, delete records
Runtime checks: pc.has_index(), index.describe_index_stats()

🔧 CLI vs SDK: When to Use Which

Use the Pinecone CLI for:

✅ Creating indexes - pc index create
✅ Deleting indexes - pc index delete
✅ Configuring indexes - pc index configure (replicas, deletion protection)
✅ Listing indexes - pc index list
✅ Describing indexes - pc index describe
✅ Creating API keys - pc api-key create
✅ One-off inspection - Checking stats, configuration
✅ Development setup - All initial infrastructure setup

Use the Python SDK for:

✅ Data operations in application code - upsert, query, search, delete RECORDS
✅ Runtime checks - pc.has_index(), index.describe_index_stats()
✅ Automated workflows - Any data operations that run repeatedly
✅ Production data access - Reading and writing vectors/records

❌ NEVER use SDK for:

Creating, deleting, or configuring indexes in application code
One-time administrative tasks

Installing the Pinecone CLI

macOS (Homebrew):

brew tap pinecone-io/tap
brew install pinecone-io/tap/pinecone

# Upgrade later
brew update && brew upgrade pinecone

Other platforms: Download from GitHub Releases (Linux, Windows, macOS)

CLI Authentication

Choose one method: Option 1: User login (recommended for development)

pc login
pc target -o "my-org" -p "my-project"

Option 2: API key

export PINECONE_API_KEY="your-api-key"
# Or: pc auth configure --global-api-key <api-key>

Option 3: Service account

export PINECONE_CLIENT_ID="your-client-id"
export PINECONE_CLIENT_SECRET="your-client-secret"

Common CLI Commands

# Create an index with integrated embeddings (recommended, do this once, not in application code)
pc index create --name my-index --dimension 1536 --metric cosine \
  --cloud aws --region us-east-1 \
  --model llama-text-embed-v2 \
  --field_map text=content

# Create a serverless index without integrated embeddings (if you need custom embeddings)
pc index create-serverless --name my-index --dimension 1536 --metric cosine \
  --cloud aws --region us-east-1

# Create an API key
pc api-key create --name agentic-quickstart

# List all indexes
pc index list

# Describe an index
pc index describe --name my-index

# Configure an index (adjust replicas, deletion protection)
pc index configure --name my-index --replicas 3
pc index configure --name my-index --deletion_protection enabled

# Delete an index
pc index delete --name my-index

# Check authentication status
pc auth status

# Get help
pc --help
pc index --help

Full CLI reference: https://docs.pinecone.io/reference/cli/command-reference

Quick Start Pattern

⚠️ CREATING INDEXES: Use the CLI (pc index create), NOT the SDK in application code. See CLI vs SDK section.

import os
from pinecone import Pinecone

# Initialize client (assumes index already created via CLI)
pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY"))

# ⚠️ NEVER create indexes in application code - use CLI instead!
# Run this in terminal BEFORE running your application:
#   pc index create --name my-index --dimension 1536 --metric cosine \
#     --cloud aws --region us-east-1 \
#     --model llama-text-embed-v2 \
#     --field_map text=content

# If you don't have CLI access, you can use SDK (but CLI is strongly preferred):
# if not pc.has_index("my-index"):
#     pc.create_index_for_model(
#         name="my-index",
#         cloud="aws",
#         region="us-east-1",
#         embed={
#             "model": "llama-text-embed-v2",  # Recommended
#             "field_map": {"text": "content"}
#         }
#     )

# Get reference to existing index
index = pc.Index("my-index")

# Upsert with namespace (always use namespaces!)
records = [
    {
        "_id": "doc1",
        "content": "Your text here",
        "metadata_field": "value"  # Flat metadata only
    }
]
index.upsert_records("my-namespace", records)

# Search with reranking (best practice)
results = index.search(
    namespace="my-namespace",
    query={
        "top_k": 10,
        "inputs": {"text": "search query"}
    },
    rerank={
        "model": "bge-reranker-v2-m3",
        "top_n": 5,
        "rank_fields": ["content"]
    }
)

# Access search results
# IMPORTANT: With reranking, use dict-style access for hit object
for hit in results.result.hits:
    doc_id = hit["_id"]              # Dict access for id
    score = hit["_score"]            # Dict access for score
    content = hit.fields["content"]  # hit.fields is also a dict
    metadata = hit.fields.get("metadata_field", "")  # Use .get() for optional fields

🚨 Common Mistakes (Must Avoid)

1. Nested Metadata (will cause API errors)

# ❌ WRONG - nested objects not allowed
bad_record = {
    "_id": "doc1",
    "user": {"name": "John", "id": 123},  # Nested object
    "tags": [{"type": "urgent"}]  # Nested in list
}

# ✅ CORRECT - flat structure only
good_record = {
    "_id": "doc1",
    "user_name": "John",
    "user_id": 123,
    "tags": ["urgent", "important"]  # Simple list of strings OK
}

2. Batch Size Limits (will cause API errors)

# Text records: MAX 96 per batch, 2MB total
# Vector records: MAX 1000 per batch, 2MB total

# ✅ CORRECT - respect limits
for i in range(0, len(records), 96):
    batch = records[i:i + 96]
    index.upsert_records(namespace, batch)

3. Missing Namespaces (causes data isolation issues)

# ❌ WRONG - no namespace
index.upsert_records(records)  # Old API pattern

# ✅ CORRECT - always use namespaces
index.upsert_records("user_123", records)
index.search(namespace="user_123", query=params)
index.delete(namespace="user_123", ids=["doc1"])

4. Skipping Reranking (reduces search quality)

# ⚠️ OK but not optimal
results = index.search(namespace="ns", query={"top_k": 5, "inputs": {"text": "query"}})

# ✅ BETTER - always rerank in production
results = index.search(
    namespace="ns",
    query={"top_k": 10, "inputs": {"text": "query"}},
    rerank={"model": "bge-reranker-v2-m3", "top_n": 5, "rank_fields": ["content"]}
)

5. Hardcoded API Keys

# ❌ WRONG
pc = Pinecone(api_key="pc-abc123...")

# ✅ CORRECT
pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY"))

6. Using SDK for Administrative Tasks (wrong tool)

# ❌ WRONG - Don't use SDK for admin operations in application code
if not pc.has_index("my-index"):
    pc.create_index_for_model(
        name="my-index",
        cloud="aws",
        region="us-east-1",
        embed={
            "model": "llama-text-embed-v2",
            "field_map": {"text": "content"}
        }
    )  # DON'T DO THIS IN APPLICATION CODE

pc.delete_index("my-index")  # DON'T DO THIS
pc.configure_index("my-index", replicas=3)  # DON'T DO THIS

# ✅ CORRECT - Use CLI in terminal for all admin tasks
# Terminal commands (run these OUTSIDE your application, during setup):
#
#   pc index create --name my-index --dimension 1536 --metric cosine \
#     --cloud aws --region us-east-1 \
#     --model llama-text-embed-v2 \
#     --field_map text=content
#
#   pc index delete --name my-index
#   pc index configure --name my-index --replicas 3

# SDK is ONLY for runtime checks and data operations in application code:
if pc.has_index("my-index"):  # ✅ OK - runtime check
    index = pc.Index("my-index")  # ✅ OK - get reference
    stats = index.describe_index_stats()  # ✅ OK - monitoring

# If you don't have CLI access, SDK is acceptable as fallback (but not ideal):
# if not pc.has_index("my-index"):
#     pc.create_index_for_model(...)  # Acceptable only if CLI unavailable

Why this is critical:

Admin operations are one-time setup tasks, not application logic
Mixing setup and runtime code makes applications fragile
CLI provides better error messages and interactive feedback
Prevents accidental index deletion in production code

Key Constraints

Constraint	Limit	Notes
Metadata per record	40KB	Flat JSON only, no nested objects
Text batch size	96 records	Also 2MB total per batch
Vector batch size	1000 records	Also 2MB total per batch
Query response size	4MB	Per query response
Metadata types	strings, ints, floats, bools, string lists	No nested structures
Consistency	Eventually consistent	Wait ~1-5s after upsert

Error Handling (Production)

Error Types

4xx (client errors): Fix your request - DON’T retry (except 429)
429 (rate limit): Retry with exponential backoff
5xx (server errors): Retry with exponential backoff

Simple Retry Pattern

import time
from pinecone.exceptions import PineconeException

def exponential_backoff_retry(func, max_retries=5):
    for attempt in range(max_retries):
        try:
            return func()
        except PineconeException as e:
            status_code = getattr(e, 'status', None)

            # Only retry transient errors
            if status_code and (status_code >= 500 or status_code == 429):
                if attempt < max_retries - 1:
                    delay = min(2 ** attempt, 60)  # Exponential backoff, cap at 60s
                    time.sleep(delay)
                else:
                    raise
            else:
                raise  # Don't retry client errors (4xx except 429)

# Usage
exponential_backoff_retry(lambda: index.upsert_records(namespace, records))

Common Operations Cheat Sheet

Index Management

⚠️ Important: For administrative tasks (create, configure, delete indexes), prefer the Pinecone CLI over the SDK. Use the SDK only when you need to check index existence or get stats programmatically in your application code. Use CLI for these operations:

# Create index with integrated embeddings (recommended, one-time setup)
pc index create --name my-index --dimension 1536 --metric cosine \
  --cloud aws --region us-east-1 \
  --model llama-text-embed-v2 \
  --field_map text=content

# Create serverless index without integrated embeddings (if you need custom embeddings)
pc index create-serverless --name my-index --dimension 1536 --metric cosine \
  --cloud aws --region us-east-1

# List indexes
pc index list

# Describe index
pc index describe --name my-index

# Configure index
pc index configure --name my-index --replicas 3

# Delete index
pc index delete --name my-index

Use SDK only for programmatic checks in application code:

# Check if index exists (in application startup)
if pc.has_index("my-index"):
    index = pc.Index("my-index")

# Get stats (for monitoring/metrics)
stats = index.describe_index_stats()
print(f"Total vectors: {stats.total_vector_count}")
print(f"Namespaces: {list(stats.namespaces.keys())}")

❌ Avoid in application code:

# Don't create indexes in application code - use CLI instead
pc.create_index(...)  # Use: pc index create ...
pc.create_index_for_model(...)  # Use: pc index create ... (with --model flag)

# Don't delete indexes in application code - use CLI instead
pc.delete_index("my-index")  # Use: pc index delete --name my-index

# Don't configure indexes in application code - use CLI instead
pc.configure_index("my-index", replicas=3)  # Use: pc index configure ...

Data Operations

# Fetch records
result = index.fetch(namespace="ns", ids=["doc1", "doc2"])
for record_id, record in result.vectors.items():
    print(f"{record_id}: {record.values}")

# List all IDs (paginated)
all_ids = []
pagination_token = None
while True:
    result = index.list(namespace="ns", limit=1000, pagination_token=pagination_token)
    all_ids.extend([r['id'] for r in result.vectors])
    if not result.pagination or not result.pagination.next:
        break
    pagination_token = result.pagination.next

# Delete records
index.delete(namespace="ns", ids=["doc1", "doc2"])

# Delete entire namespace
index.delete(namespace="ns", delete_all=True)

Search with Filters

# Metadata filtering - IMPORTANT: Only include "filter" key if you have filters
# Don't set filter to None - omit the key entirely
results = index.search(
    namespace="ns",
    query={
        "top_k": 10,
        "inputs": {"text": "query"},
        "filter": {
            "$and": [
                {"category": {"$in": ["docs", "tutorial"]}},
                {"priority": {"$ne": "low"}},
                {"created_at": {"$gte": "2025-01-01"}}
            ]
        }
    },
    rerank={"model": "bge-reranker-v2-m3", "top_n": 5, "rank_fields": ["content"]}
)

# Search without filters - omit the "filter" key
results = index.search(
    namespace="ns",
    query={
        "top_k": 10,
        "inputs": {"text": "query"}
        # No filter key at all
    },
    rerank={"model": "bge-reranker-v2-m3", "top_n": 5, "rank_fields": ["content"]}
)

# Dynamic filter pattern - conditionally add filter to query dict
query_dict = {
    "top_k": 10,
    "inputs": {"text": "query"}
}
if has_filters:  # Only add filter if it exists
    query_dict["filter"] = {"category": {"$eq": "docs"}}

results = index.search(namespace="ns", query=query_dict, rerank={...})

# Filter operators: $eq, $ne, $gt, $gte, $lt, $lte, $in, $nin, $exists, $and, $or

Recommended Patterns

Namespace Strategy

# Multi-user apps
namespace = f"user_{user_id}"

# Session-based
namespace = f"session_{session_id}"

# Content-based
namespace = "knowledge_base"
namespace = "chat_history"

Batch Processing

def batch_upsert(index, namespace, records, batch_size=96):
    for i in range(0, len(records), batch_size):
        batch = records[i:i + batch_size]
        exponential_backoff_retry(
            lambda: index.upsert_records(namespace, batch)
        )
        time.sleep(0.1)  # Rate limiting

Environment Config

class PineconeClient:
    def __init__(self):
        self.api_key = os.getenv("PINECONE_API_KEY")
        if not self.api_key:
            raise ValueError("PINECONE_API_KEY required")
        self.pc = Pinecone(api_key=self.api_key)
        self.index_name = os.getenv("PINECONE_INDEX", "default-index")

    def get_index(self):
        return self.pc.Index(self.index_name)

Embedding Models (2025)

Integrated embeddings (recommended - Pinecone handles embedding):

llama-text-embed-v2: High-performance, recommended for most cases
multilingual-e5-large: Multilingual content (1024 dims)
pinecone-sparse-english-v0: Keyword/hybrid search

Use integrated embeddings - don’t generate vectors manually unless you have a specific reason.

Official Documentation Resources

For advanced features not covered in this quick reference:

API reference: https://docs.pinecone.io/reference/api/introduction
Bulk imports (S3/GCS): https://docs.pinecone.io/guides/index-data/import-data
Hybrid search: https://docs.pinecone.io/guides/search/hybrid-search
Back ups (backup/restore): https://docs.pinecone.io/guides/manage-data/backups-overview
Error handling: https://docs.pinecone.io/guides/production/error-handling
Database limits: https://docs.pinecone.io/reference/api/database-limits
Production monitoring: https://docs.pinecone.io/guides/production/monitoring
Python SDK docs: https://sdk.pinecone.io/python/index.html

Quick Troubleshooting

Issue	Solution
`ModuleNotFoundError: pinecone.grpc`	Wrong SDK - reinstall with `pip install pinecone`
`Metadata too large` error	Check 40KB limit, flatten nested objects
`Batch too large` error	Reduce to 96 records (text) or 1000 (vectors)
Search returns no results	Check namespace, wait for indexing (~5s), verify data exists
Rate limit (429) errors	Implement exponential backoff, reduce request rate
Nested metadata error	Flatten all metadata - no nested objects allowed

Remember: Always use namespaces, always rerank, always handle errors with retry logic.

Release notes

Policies

Pinecone agent reference python

null

Pinecone quick reference for agents

⚠️ Critical: Installation & SDK

🚫 CRITICAL: CLI for Admin, SDK for Data

🔧 CLI vs SDK: When to Use Which

Installing the Pinecone CLI

CLI Authentication

Common CLI Commands

Quick Start Pattern

🚨 Common Mistakes (Must Avoid)

1. Nested Metadata (will cause API errors)

2. Batch Size Limits (will cause API errors)

3. Missing Namespaces (causes data isolation issues)

4. Skipping Reranking (reduces search quality)

5. Hardcoded API Keys

6. Using SDK for Administrative Tasks (wrong tool)

Key Constraints

Error Handling (Production)

Error Types

Simple Retry Pattern

Common Operations Cheat Sheet

Index Management

Data Operations

Search with Filters

Recommended Patterns

Namespace Strategy

Batch Processing

Environment Config

Embedding Models (2025)

Official Documentation Resources

Quick Troubleshooting

Release notes

Policies

​null

​Pinecone quick reference for agents

​⚠️ Critical: Installation & SDK

​🚫 CRITICAL: CLI for Admin, SDK for Data

​🔧 CLI vs SDK: When to Use Which

​Installing the Pinecone CLI

​CLI Authentication

​Common CLI Commands

​Quick Start Pattern

​🚨 Common Mistakes (Must Avoid)

​1. Nested Metadata (will cause API errors)

​2. Batch Size Limits (will cause API errors)

​3. Missing Namespaces (causes data isolation issues)

​4. Skipping Reranking (reduces search quality)

​5. Hardcoded API Keys

​6. Using SDK for Administrative Tasks (wrong tool)

​Key Constraints

​Error Handling (Production)

​Error Types

​Simple Retry Pattern

​Common Operations Cheat Sheet

​Index Management

​Data Operations

​Search with Filters

​Recommended Patterns

​Namespace Strategy

​Batch Processing

​Environment Config

​Embedding Models (2025)

​Official Documentation Resources

​Quick Troubleshooting

null

Pinecone quick reference for agents

⚠️ Critical: Installation & SDK

🚫 CRITICAL: CLI for Admin, SDK for Data

🔧 CLI vs SDK: When to Use Which

Installing the Pinecone CLI

CLI Authentication

Common CLI Commands

Quick Start Pattern

🚨 Common Mistakes (Must Avoid)

1. Nested Metadata (will cause API errors)

2. Batch Size Limits (will cause API errors)

3. Missing Namespaces (causes data isolation issues)

4. Skipping Reranking (reduces search quality)

5. Hardcoded API Keys

6. Using SDK for Administrative Tasks (wrong tool)

Key Constraints

Error Handling (Production)

Error Types

Simple Retry Pattern

Common Operations Cheat Sheet

Index Management

Data Operations

Search with Filters

Recommended Patterns

Namespace Strategy

Batch Processing

Environment Config

Embedding Models (2025)

Official Documentation Resources

Quick Troubleshooting