Skip to main content

null

Pinecone quick reference for agents

Official docs: https://docs.pinecone.io/ - For complete API reference, advanced features, and detailed guides.
This guide covers critical gotchas, best practices, and common patterns specific to this project. For anything not covered here, consult the official Pinecone documentation.

⚠️ Critical: Installation & SDK

ALWAYS use the current SDK:
pip install pinecone          # ✅ Correct (current SDK)
pip install pinecone-client   # ❌ WRONG (deprecated, old API)
Current API (2025):
from pinecone import Pinecone  # ✅ Correct import

🚫 CRITICAL: CLI for Admin, SDK for Data

ALWAYS use CLI for administrative tasks:
  • ❌ NEVER call pc.create_index(), pc.delete_index(), pc.configure_index() in code
  • ✅ ALWAYS use pc index create, pc index delete, pc index configure commands
  • Reason: Admin operations are one-time setup tasks, not application logic
ONLY use SDK in your application code for:
  • Data operations: upsert, query, search, fetch, delete records
  • Runtime checks: pc.has_index(), index.describe_index_stats()

🔧 CLI vs SDK: When to Use Which

Use the Pinecone CLI for:
  • Creating indexes - pc index create
  • Deleting indexes - pc index delete
  • Configuring indexes - pc index configure (replicas, deletion protection)
  • Listing indexes - pc index list
  • Describing indexes - pc index describe
  • Creating API keys - pc api-key create
  • One-off inspection - Checking stats, configuration
  • Development setup - All initial infrastructure setup
Use the Python SDK for:
  • Data operations in application code - upsert, query, search, delete RECORDS
  • Runtime checks - pc.has_index(), index.describe_index_stats()
  • Automated workflows - Any data operations that run repeatedly
  • Production data access - Reading and writing vectors/records
❌ NEVER use SDK for:
  • Creating, deleting, or configuring indexes in application code
  • One-time administrative tasks

Installing the Pinecone CLI

macOS (Homebrew):
brew tap pinecone-io/tap
brew install pinecone-io/tap/pinecone

# Upgrade later
brew update && brew upgrade pinecone
Other platforms: Download from GitHub Releases (Linux, Windows, macOS)

CLI Authentication

Choose one method: Option 1: User login (recommended for development)
pc login
pc target -o "my-org" -p "my-project"
Option 2: API key
export PINECONE_API_KEY="your-api-key"
# Or: pc auth configure --global-api-key <api-key>
Option 3: Service account
export PINECONE_CLIENT_ID="your-client-id"
export PINECONE_CLIENT_SECRET="your-client-secret"

Common CLI Commands

# Create an index with integrated embeddings (recommended, do this once, not in application code)
pc index create --name my-index --dimension 1536 --metric cosine \
  --cloud aws --region us-east-1 \
  --model llama-text-embed-v2 \
  --field_map text=content

# Create a serverless index without integrated embeddings (if you need custom embeddings)
pc index create-serverless --name my-index --dimension 1536 --metric cosine \
  --cloud aws --region us-east-1

# Create an API key
pc api-key create --name agentic-quickstart

# List all indexes
pc index list

# Describe an index
pc index describe --name my-index

# Configure an index (adjust replicas, deletion protection)
pc index configure --name my-index --replicas 3
pc index configure --name my-index --deletion_protection enabled

# Delete an index
pc index delete --name my-index

# Check authentication status
pc auth status

# Get help
pc --help
pc index --help
Full CLI reference: https://docs.pinecone.io/reference/cli/command-reference

Quick Start Pattern

⚠️ CREATING INDEXES: Use the CLI (pc index create), NOT the SDK in application code. See CLI vs SDK section.
import os
from pinecone import Pinecone

# Initialize client (assumes index already created via CLI)
pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY"))

# ⚠️ NEVER create indexes in application code - use CLI instead!
# Run this in terminal BEFORE running your application:
#   pc index create --name my-index --dimension 1536 --metric cosine \
#     --cloud aws --region us-east-1 \
#     --model llama-text-embed-v2 \
#     --field_map text=content

# If you don't have CLI access, you can use SDK (but CLI is strongly preferred):
# if not pc.has_index("my-index"):
#     pc.create_index_for_model(
#         name="my-index",
#         cloud="aws",
#         region="us-east-1",
#         embed={
#             "model": "llama-text-embed-v2",  # Recommended
#             "field_map": {"text": "content"}
#         }
#     )

# Get reference to existing index
index = pc.Index("my-index")

# Upsert with namespace (always use namespaces!)
records = [
    {
        "_id": "doc1",
        "content": "Your text here",
        "metadata_field": "value"  # Flat metadata only
    }
]
index.upsert_records("my-namespace", records)

# Search with reranking (best practice)
results = index.search(
    namespace="my-namespace",
    query={
        "top_k": 10,
        "inputs": {"text": "search query"}
    },
    rerank={
        "model": "bge-reranker-v2-m3",
        "top_n": 5,
        "rank_fields": ["content"]
    }
)

# Access search results
# IMPORTANT: With reranking, use dict-style access for hit object
for hit in results.result.hits:
    doc_id = hit["_id"]              # Dict access for id
    score = hit["_score"]            # Dict access for score
    content = hit.fields["content"]  # hit.fields is also a dict
    metadata = hit.fields.get("metadata_field", "")  # Use .get() for optional fields

🚨 Common Mistakes (Must Avoid)

1. Nested Metadata (will cause API errors)

# ❌ WRONG - nested objects not allowed
bad_record = {
    "_id": "doc1",
    "user": {"name": "John", "id": 123},  # Nested object
    "tags": [{"type": "urgent"}]  # Nested in list
}

# ✅ CORRECT - flat structure only
good_record = {
    "_id": "doc1",
    "user_name": "John",
    "user_id": 123,
    "tags": ["urgent", "important"]  # Simple list of strings OK
}

2. Batch Size Limits (will cause API errors)

# Text records: MAX 96 per batch, 2MB total
# Vector records: MAX 1000 per batch, 2MB total

# ✅ CORRECT - respect limits
for i in range(0, len(records), 96):
    batch = records[i:i + 96]
    index.upsert_records(namespace, batch)

3. Missing Namespaces (causes data isolation issues)

# ❌ WRONG - no namespace
index.upsert_records(records)  # Old API pattern

# ✅ CORRECT - always use namespaces
index.upsert_records("user_123", records)
index.search(namespace="user_123", query=params)
index.delete(namespace="user_123", ids=["doc1"])

4. Skipping Reranking (reduces search quality)

# ⚠️ OK but not optimal
results = index.search(namespace="ns", query={"top_k": 5, "inputs": {"text": "query"}})

# ✅ BETTER - always rerank in production
results = index.search(
    namespace="ns",
    query={"top_k": 10, "inputs": {"text": "query"}},
    rerank={"model": "bge-reranker-v2-m3", "top_n": 5, "rank_fields": ["content"]}
)

5. Hardcoded API Keys

# ❌ WRONG
pc = Pinecone(api_key="pc-abc123...")

# ✅ CORRECT
pc = Pinecone(api_key=os.getenv("PINECONE_API_KEY"))

6. Using SDK for Administrative Tasks (wrong tool)

# ❌ WRONG - Don't use SDK for admin operations in application code
if not pc.has_index("my-index"):
    pc.create_index_for_model(
        name="my-index",
        cloud="aws",
        region="us-east-1",
        embed={
            "model": "llama-text-embed-v2",
            "field_map": {"text": "content"}
        }
    )  # DON'T DO THIS IN APPLICATION CODE

pc.delete_index("my-index")  # DON'T DO THIS
pc.configure_index("my-index", replicas=3)  # DON'T DO THIS

# ✅ CORRECT - Use CLI in terminal for all admin tasks
# Terminal commands (run these OUTSIDE your application, during setup):
#
#   pc index create --name my-index --dimension 1536 --metric cosine \
#     --cloud aws --region us-east-1 \
#     --model llama-text-embed-v2 \
#     --field_map text=content
#
#   pc index delete --name my-index
#   pc index configure --name my-index --replicas 3

# SDK is ONLY for runtime checks and data operations in application code:
if pc.has_index("my-index"):  # ✅ OK - runtime check
    index = pc.Index("my-index")  # ✅ OK - get reference
    stats = index.describe_index_stats()  # ✅ OK - monitoring

# If you don't have CLI access, SDK is acceptable as fallback (but not ideal):
# if not pc.has_index("my-index"):
#     pc.create_index_for_model(...)  # Acceptable only if CLI unavailable
Why this is critical:
  • Admin operations are one-time setup tasks, not application logic
  • Mixing setup and runtime code makes applications fragile
  • CLI provides better error messages and interactive feedback
  • Prevents accidental index deletion in production code

Key Constraints

ConstraintLimitNotes
Metadata per record40KBFlat JSON only, no nested objects
Text batch size96 recordsAlso 2MB total per batch
Vector batch size1000 recordsAlso 2MB total per batch
Query response size4MBPer query response
Metadata typesstrings, ints, floats, bools, string listsNo nested structures
ConsistencyEventually consistentWait ~1-5s after upsert

Error Handling (Production)

Error Types

  • 4xx (client errors): Fix your request - DON’T retry (except 429)
  • 429 (rate limit): Retry with exponential backoff
  • 5xx (server errors): Retry with exponential backoff

Simple Retry Pattern

import time
from pinecone.exceptions import PineconeException

def exponential_backoff_retry(func, max_retries=5):
    for attempt in range(max_retries):
        try:
            return func()
        except PineconeException as e:
            status_code = getattr(e, 'status', None)

            # Only retry transient errors
            if status_code and (status_code >= 500 or status_code == 429):
                if attempt < max_retries - 1:
                    delay = min(2 ** attempt, 60)  # Exponential backoff, cap at 60s
                    time.sleep(delay)
                else:
                    raise
            else:
                raise  # Don't retry client errors (4xx except 429)

# Usage
exponential_backoff_retry(lambda: index.upsert_records(namespace, records))

Common Operations Cheat Sheet

Index Management

⚠️ Important: For administrative tasks (create, configure, delete indexes), prefer the Pinecone CLI over the SDK. Use the SDK only when you need to check index existence or get stats programmatically in your application code. Use CLI for these operations:
# Create index with integrated embeddings (recommended, one-time setup)
pc index create --name my-index --dimension 1536 --metric cosine \
  --cloud aws --region us-east-1 \
  --model llama-text-embed-v2 \
  --field_map text=content

# Create serverless index without integrated embeddings (if you need custom embeddings)
pc index create-serverless --name my-index --dimension 1536 --metric cosine \
  --cloud aws --region us-east-1

# List indexes
pc index list

# Describe index
pc index describe --name my-index

# Configure index
pc index configure --name my-index --replicas 3

# Delete index
pc index delete --name my-index
Use SDK only for programmatic checks in application code:
# Check if index exists (in application startup)
if pc.has_index("my-index"):
    index = pc.Index("my-index")

# Get stats (for monitoring/metrics)
stats = index.describe_index_stats()
print(f"Total vectors: {stats.total_vector_count}")
print(f"Namespaces: {list(stats.namespaces.keys())}")
❌ Avoid in application code:
# Don't create indexes in application code - use CLI instead
pc.create_index(...)  # Use: pc index create ...
pc.create_index_for_model(...)  # Use: pc index create ... (with --model flag)

# Don't delete indexes in application code - use CLI instead
pc.delete_index("my-index")  # Use: pc index delete --name my-index

# Don't configure indexes in application code - use CLI instead
pc.configure_index("my-index", replicas=3)  # Use: pc index configure ...

Data Operations

# Fetch records
result = index.fetch(namespace="ns", ids=["doc1", "doc2"])
for record_id, record in result.vectors.items():
    print(f"{record_id}: {record.values}")

# List all IDs (paginated)
all_ids = []
pagination_token = None
while True:
    result = index.list(namespace="ns", limit=1000, pagination_token=pagination_token)
    all_ids.extend([r['id'] for r in result.vectors])
    if not result.pagination or not result.pagination.next:
        break
    pagination_token = result.pagination.next

# Delete records
index.delete(namespace="ns", ids=["doc1", "doc2"])

# Delete entire namespace
index.delete(namespace="ns", delete_all=True)

Search with Filters

# Metadata filtering - IMPORTANT: Only include "filter" key if you have filters
# Don't set filter to None - omit the key entirely
results = index.search(
    namespace="ns",
    query={
        "top_k": 10,
        "inputs": {"text": "query"},
        "filter": {
            "$and": [
                {"category": {"$in": ["docs", "tutorial"]}},
                {"priority": {"$ne": "low"}},
                {"created_at": {"$gte": "2025-01-01"}}
            ]
        }
    },
    rerank={"model": "bge-reranker-v2-m3", "top_n": 5, "rank_fields": ["content"]}
)

# Search without filters - omit the "filter" key
results = index.search(
    namespace="ns",
    query={
        "top_k": 10,
        "inputs": {"text": "query"}
        # No filter key at all
    },
    rerank={"model": "bge-reranker-v2-m3", "top_n": 5, "rank_fields": ["content"]}
)

# Dynamic filter pattern - conditionally add filter to query dict
query_dict = {
    "top_k": 10,
    "inputs": {"text": "query"}
}
if has_filters:  # Only add filter if it exists
    query_dict["filter"] = {"category": {"$eq": "docs"}}

results = index.search(namespace="ns", query=query_dict, rerank={...})

# Filter operators: $eq, $ne, $gt, $gte, $lt, $lte, $in, $nin, $exists, $and, $or

Namespace Strategy

# Multi-user apps
namespace = f"user_{user_id}"

# Session-based
namespace = f"session_{session_id}"

# Content-based
namespace = "knowledge_base"
namespace = "chat_history"

Batch Processing

def batch_upsert(index, namespace, records, batch_size=96):
    for i in range(0, len(records), batch_size):
        batch = records[i:i + batch_size]
        exponential_backoff_retry(
            lambda: index.upsert_records(namespace, batch)
        )
        time.sleep(0.1)  # Rate limiting

Environment Config

class PineconeClient:
    def __init__(self):
        self.api_key = os.getenv("PINECONE_API_KEY")
        if not self.api_key:
            raise ValueError("PINECONE_API_KEY required")
        self.pc = Pinecone(api_key=self.api_key)
        self.index_name = os.getenv("PINECONE_INDEX", "default-index")

    def get_index(self):
        return self.pc.Index(self.index_name)

Embedding Models (2025)

Integrated embeddings (recommended - Pinecone handles embedding):
  • llama-text-embed-v2: High-performance, recommended for most cases
  • multilingual-e5-large: Multilingual content (1024 dims)
  • pinecone-sparse-english-v0: Keyword/hybrid search
Use integrated embeddings - don’t generate vectors manually unless you have a specific reason.

Official Documentation Resources

For advanced features not covered in this quick reference:

Quick Troubleshooting

IssueSolution
ModuleNotFoundError: pinecone.grpcWrong SDK - reinstall with pip install pinecone
Metadata too large errorCheck 40KB limit, flatten nested objects
Batch too large errorReduce to 96 records (text) or 1000 (vectors)
Search returns no resultsCheck namespace, wait for indexing (~5s), verify data exists
Rate limit (429) errorsImplement exponential backoff, reduce request rate
Nested metadata errorFlatten all metadata - no nested objects allowed

Remember: Always use namespaces, always rerank, always handle errors with retry logic.
I