Pinecone quick reference for agents
Official docs: https://docs.pinecone.io/ - For complete API reference, advanced features, and detailed guides.This guide covers critical gotchas, best practices, and common patterns specific to this project. For anything not covered here, consult the official Pinecone documentation.
⚠️ Critical: Installation & SDK
ALWAYS use the current SDK:- Node.js 18.x or later
- TypeScript 4.1 or later (recommended)
🛡️ TypeScript Types & Type Safety
When working with the Pinecone SDK, proper type handling prevents runtime errors:Search Result Field Typing
Search results returnhit.fields as a generic object. Always cast to a typed structure:
Complete Search Hit Interface
Best Practices for Type Safety
- Always cast
hit.fields: Useas Record<string, any>or define a proper interface - Use optional chaining:
fields?.fieldName ?? defaultValue - Convert to strings:
String(value)when building output - Define record interfaces: Match your actual record structure for IDE autocomplete
🔧 CLI vs SDK: When to Use Which
Use the Pinecone CLI for:- ✅ Creating indexes -
pc index create - ✅ Deleting indexes -
pc index delete - ✅ Configuring indexes -
pc index configure(replicas, deletion protection) - ✅ Listing indexes -
pc index list - ✅ Describing indexes -
pc index describe - ✅ Creating API keys -
pc api-key create - ✅ One-off inspection - Checking stats, configuration
- ✅ Development setup - All initial infrastructure setup
- ✅ Data operations in application code - upsert, query, search, delete RECORDS
- ✅ Runtime checks - checking index existence,
index.describeIndexStats() - ✅ Automated workflows - Any data operations that run repeatedly
- ✅ Production data access - Reading and writing vectors/records
- Creating, deleting, or configuring indexes in application code
- One-time administrative tasks
Installing the Pinecone CLI
macOS (Homebrew):CLI Authentication
Choose one method: Option 1: User login (recommended for development)🔒 API Key Security Best Practices
❌ NEVER do this:- 🔓 Git history is forever - If your key ends up in a commit, it’s compromised forever
- 🤖 Bots scan public repos - GitHub has automated tools that detect leaked API keys immediately
- 💰 Financial impact - Attackers can use your key to run expensive queries
- 🛡️ Revoke immediately - If a key is exposed, rotate it in the Pinecone console immediately
Quickstarts
Important for all quickstarts: Execute all steps completely. Keep setup minimal (directories, npm dependencies only). Do not expect the user to satisfy any prerequisites except providing API keys. For summaries, use only README.md and SUMMARY.md.When you are asked to help get started with Pinecone, ask the user to choose an option:
- Quick test: Create an index, upsert data, and perform semantic search.
-
Choose a use case:
- Search: Build a semantic search system that returns ranked results from your knowledge base. This pattern is ideal for search interfaces where users need a list of relevant documents with confidence scores.
- RAG: Build a multi-tenant RAG (Retrieval-Augmented Generation) system that retrieves relevant context per tenant and feeds it to an LLM to generate answers. Each tenant (organization, workspace, or user) has isolated data stored in separate Pinecone namespaces. This pattern is ideal for knowledge bases, customer support platforms, and collaborative workspaces.
- Recommendations: Build a recommendation engine that suggests similar items based on semantic similarity. This pattern is ideal for e-commerce, content platforms, and user personalization systems.
Setup Prerequisites (all quickstarts)
Before starting any quickstart, complete these steps:- Set up Node.js environment: Create project directory, initialize npm, and install Pinecone SDK
- Install CLI: Run
pc versionto check. If not installed:brew tap pinecone-io/tap && brew install pinecone-io/tap/pinecone(macOS) or download from GitHub releases. If already installed, upgrade:brew update && brew upgrade pinecone - Configure API key: Ask user for Pinecone API key, set as
PINECONE_API_KEYenv variable, then runpc auth configure --api-key $PINECONE_API_KEY - For RAG quickstart only: Also obtain and set
OPENAI_API_KEYorANTHROPIC_API_KEY
Quick test
Complete Setup Prerequisites first. Step 1. Implement semantic search-
Create an index called “agentic-quickstart-test” with an integrated embedding model that can handle text documents. Use the Pinecone CLI for this. Use the API key env variable to authenticate.
-
Prepare a sample dataset of factual statements from different domains like history, physics, technology, and music and upsert the dataset into a new namespace in the index:
-
Search the dense index for ten records that are most semantically similar to the query, “Famous historical structures and monuments”:
- Show the search results to the user. Most of the results will be about historical structures and monuments. However, a few unrelated statements will be included as well and ranked high in the list, for example, a statement about Shakespeare. Don’t show the literal results in your terminal. Print the important result details in the chat.
-
To get a more accurate ranking, search again but this time rerank the initial results based on their relevance to the query:
- Show the search results to the user. All of the most relevant results about historical structures and monuments will now be ranked highest. Again, don’t show the literal results in your terminal. Print the important result details in the chat.
Build a semantic search system
Complete Setup Prerequisites first. Step 1. Build a semantic search system-
Create an index called “agentic-quickstart-search” with an integrated embedding model that can handle text documents. Use the Pinecone CLI for this. Use the API key env variable to authenticate.
- Create 20 unique documents with metadata. Each document should cover a unique foundational AI/ML concept.
-
Store the documents in the Pinecone index. Be sure to use the
upsertRecords()method not theupsert()method. -
Create a search function that:
- Uses semantic search to find relevant documents
- Includes reranking with the hosted bge-reranker-v2-m3 model
- Allows filtering by metadata
- Returns well-formatted results
- Uses production-ready error handling patterns
searchRecords()method, not thequery()method. - Then search the knowledge base with 3 sample queries.
- Show the search results to the user. Don’t show the literal results in your terminal. Print the important result details in the chat.
-
Provide a summary of what you did including:
- The production-ready patterns you used
- A concise explanation of the generated code
Build a multi-tenant RAG system
Complete Setup Prerequisites first (including step 4 for LLM API keys). This example builds an Email Management & Search Platform where each user has isolated access to their own email mailbox—ensuring privacy and data segregation. Each person’s email is indexed in its own namespace and they have access only to that namespace. Step 1. Build a RAG system-
Create an index called “agentic-quickstart-rag” with an integrated embedding model that can handle text documents. Use the Pinecone CLI for this. Use the API key env variable to authenticate.
-
Create 20 unique email messages with metadata across four categories:
- Work Correspondence (5 emails): Project updates, meeting notes, team announcements
- Project Management (5 emails): Task assignments, progress reports, deadline reminders
- Client Communications (5 emails): Client requests, proposals, feedback
- Administrative (5 emails): HR notices, policy updates, expense reports
message_type: “work”, “project”, “client”, “admin”priority: “high”, “medium”, “low”from_domain: “internal”, “client”, “vendor”date_received: ISO date stringhas_attachments: true or false
-
Store the emails in the Pinecone index using separate namespaces for each user (e.g.,
user_alice,user_bob). Be sure to use theupsertRecords()method not theupsert()method. -
Create a RAG function that:
- Takes a user query and user identifier as input
- Searches ONLY the specified user’s namespace to ensure data isolation
- Retrieves relevant emails using semantic search
- Reranks results with the hosted bge-reranker-v2-m3 model (prioritizing by priority and message_type)
- Constructs a prompt with the retrieved email content
- Sends the prompt to an LLM (use OpenAI GPT-4 or Anthropic Claude)
- Returns the generated answer with source citations including sender, date, and priority level
- Enforce namespace isolation - never return emails from other users
- Handle context window limits intelligently
- Include metadata in citations (message type, date received, priority)
- Flag high-priority emails in the response
- Gracefully handle missing or insufficient email context
searchRecords()method, not thequery()method. -
Then answer 3 sample questions as a user querying their email mailbox:
- “What updates did I receive about the quarterly project?”
- “Show me all client feedback we’ve received this month”
- “Find high-priority emails from my team about the presentation”
- Give the user insight into the process. Show the search results from Pinecone as well as the answers from the LLM. Don’t show the literal results and answers in your terminal. Print the important result and answer details in the chat.
-
Provide a summary of what you did including:
- The production-ready patterns you used
- How namespace isolation ensures privacy and data segregation
- A concise explanation of the generated code
Build a recommendation engine
Complete Setup Prerequisites first. Step 1. Build a recommendation engine-
Create an index called “agentic-quickstart-recommendations” with an integrated embedding model that can handle text documents. Use the Pinecone CLI for this. Use the API key env variable to authenticate.
- Create 20 diverse product listings with rich metadata.
-
Store the product listings in the Pinecone index. Be sure to use the
upsertRecords()method not theupsert()method. -
Create a recommendation engine that:
- Takes a product ID as input and finds similar items.
- Uses vector similarity to find semantically related products.
- Allows filtering by category, price range, and other attributes.
- Implements diversity strategies to limit results per category and score spreading.
- Aggregates multi-item preferences to generate recommendations.
- Returns well-formatted recommendations with similarity scores.
searchRecords()method, not thequery()method. - Then test the recommendation engine with 3 sample products.
- Show the search results to the user. For each test, explain why these recommendations make sense based on the similarity scores and filters. Don’t show the literal results in your terminal. Print the important result details in the chat.
-
Provide a summary of what you did including:
- The production-ready patterns you used
- A concise explanation of the generated code
Index creation
⚠️ Use CLI (pc index create), NOT SDK in application code. See CLI vs SDK.
Index creation with integrated embeddings (preferred)
Available embedding models (current)
llama-text-embed-v2: High-performance, configurable dimensions, recommended for most use casesmultilingual-e5-large: For multilingual content, 1024 dimensionspinecone-sparse-english-v0: For keyword/hybrid search scenarios
Data operations
Upserting records (text with integrated embeddings)
Updating records
Fetching records
Listing record IDs
Search operations
Semantic search with reranking (best practice)
Lexical search (keyword-based)
Metadata filtering
Supported filter operators
$eq: equals$ne: not equals$gt,$gte: greater than, greater than or equal$lt,$lte: less than, less than or equal$in: in list$nin: not in list$exists: field exists$and,$or: logical operators
🚨 Common Mistakes (Must Avoid)
1. Nested Metadata (will cause API errors)
2. Batch Size Limits (will cause API errors)
3. Missing Namespaces (causes data isolation issues)
4. Skipping Reranking (reduces search quality)
5. Hardcoded API Keys
6. Missing Async/Await (TypeScript-specific)
⏳ Indexing Delays & Eventual Consistency (Important!)
Pinecone uses eventual consistency. This means records don’t immediately appear in searches or stats after upserting.Realistic Timing Expectations
| Operation | Time | Notes |
|---|---|---|
| Record stored | 1-3 seconds | Data is persisted |
| Records searchable | 5-10 seconds | Can find via searchRecords() |
| Stats updated | 10-20 seconds | describeIndexStats() shows accurate count |
| Indexes ready | 30-60 seconds | New indexes enter “Ready” state |
Correct Wait Pattern
Production Pattern: Polling for Readiness
🆘 Troubleshooting
Problem: describeIndexStats() returns 0 records after upsert
Cause: Eventual consistency - records haven’t indexed yet
Solution:
- Wait 10+ seconds minimum
- Check if records were actually upserted (no errors thrown)
- Use the polling pattern above for production code
Problem: Search returns no results
Cause: Usually one of these:- Field name mismatch (using wrong field in
--field_map) - Records not indexed yet (use polling pattern)
- Empty namespace or wrong namespace name
- Filtering too aggressively
Problem: TypeScript errors accessing hit.fields
Cause: SDK returns generic object, TypeScript doesn’t know your field names Solution: Use type castingKey Constraints
| Constraint | Limit | Notes |
|---|---|---|
| Metadata per record | 40KB | Flat JSON only, no nested objects |
| Text batch size | 96 records | Also 2MB total per batch |
| Vector batch size | 1000 records | Also 2MB total per batch |
| Query response size | 4MB | Per query response |
| Metadata types | strings, ints, floats, bools, string lists | No nested structures |
| Consistency | Eventually consistent | 5-10s search, 10-20s for stats (not 1-5s!) |
Error Handling (Production)
Error Types
- 4xx (client errors): Fix your request - DON’T retry (except 429)
- 429 (rate limit): Retry with exponential backoff
- 5xx (server errors): Retry with exponential backoff
Simple Retry Pattern
Common Operations Cheat Sheet
Index Management
⚠️ Important: For administrative tasks (create, configure, delete indexes), prefer the Pinecone CLI over the SDK. Use the SDK only when you need to check index existence or get stats programmatically in your application code. Use CLI for these operations:Data Operations
Search with Filters
Recommended Patterns
Namespace Strategy
Batch Processing
Environment Config
Embedding Models (2025)
Integrated embeddings (recommended - Pinecone handles embedding):llama-text-embed-v2: High-performance, recommended for most casesmultilingual-e5-large: Multilingual content (1024 dims)pinecone-sparse-english-v0: Keyword/hybrid search
Official Documentation Resources
For advanced features not covered in this quick reference:- API reference: https://docs.pinecone.io/reference/api/introduction
- Bulk imports (S3/GCS): https://docs.pinecone.io/guides/index-data/import-data
- Hybrid search: https://docs.pinecone.io/guides/search/hybrid-search
- Back ups (backup/restore): https://docs.pinecone.io/guides/manage-data/backups-overview
- Error handling: https://docs.pinecone.io/guides/production/error-handling
- Database limits: https://docs.pinecone.io/reference/api/database-limits
- Production monitoring: https://docs.pinecone.io/guides/production/monitoring
- TypeScript SDK docs: https://sdk.pinecone.io/typescript/
- TypeScript SDK GitHub: https://github.com/pinecone-io/pinecone-ts-client
Quick Troubleshooting
| Issue | Solution |
|---|---|
Cannot find module '@pinecone-database/pinecone' | Wrong package - install with npm install @pinecone-database/pinecone |
Metadata too large error | Check 40KB limit, flatten nested objects |
Batch too large error | Reduce to 96 records (text) or 1000 (vectors) |
| Search returns no results | Check namespace, wait for indexing (~5s), verify data exists |
| Rate limit (429) errors | Implement exponential backoff, reduce request rate |
| Nested metadata error | Flatten all metadata - no nested objects allowed |
| TypeScript compilation errors | Check TypeScript version (4.1+), verify types are installed |
| Promise rejection errors | Always use await or .catch() for async operations |
Remember: Always use namespaces, always rerank, always handle errors with retry logic, always use async/await.