Prepare your project structure
One of the first steps towards building a production-ready Pinecone index is configuring your project correctly.- Consider creating a separate project for your development and production indexes, to allow for testing changes to your index before deploying them to production.
- Ensure that you have properly configured user access to the Pinecone console, so that only those users who need to access the production index can do so.
- Ensure that you have properly configured access through the API by managing API keys and using API key permissions.
Enforce security
Use Pinecone’s security features to protect your production data:- Data security
- Private endpoints
- Customer-managed encryption keys (CMEK)
- Authorization
- API keys
- Role-based access control (RBAC)
- Organization single sign-on (SSO)
- Audit logs
- Bring your own cloud
Design your indexes for scale
Follow these best practices when designing and populating your indexes:- Data ingestion: For large datasets (10M+ records), import from object storage for the most efficient and cost-effective ingestion. For ongoing ingestion, upsert in batches to optimize speed and efficiency. See the data ingestion overview for details.
- Dimensionality: Consider the dimensionality of your vectors. Higher dimensions can offer more accuracy but require more resources.
- Data modeling: Use structured IDs (e.g.,
document_id#chunk_number
) for efficient operations. Design metadata to support filtering, linking related chunks, and traceability. See the data modeling guide for details. - Namespaces: When indexing, try to use namespaces to keep your data among tenants separate, and do not use multiple indexes for this purpose. Namespaces are more efficient and more affordable in the long run.
Understand database limits
Architect your application to work within Pinecone’s database limits:- Rate limits: Serverless indexes have per-second operation limits for queries, upserts, updates, and deletes. Implement error handling with exponential backoff to handle rate limit errors gracefully.
- Size limits: Be aware of constraints on vector dimensionality, metadata size per record, record ID length, maximum
top_k
values, and query result sizes. Design your data model accordingly. - Index limits: Plan for index capacity based on your plan tier. Use namespaces to partition data within indexes rather than creating multiple indexes.
- Plan limits: Starter plans have monthly read/write unit limits. Upgrade to Standard or Enterprise for unlimited read/write units and higher throughput needs.
Test your query results
Before you move your index to production, make sure that your index is returning accurate results in the context of your application by identifying the appropriate metrics for evaluating your results.Optimize performance
Before serving production workloads, optimize your Pinecone implementation:- Increase search relevance: Use techniques like reranking, metadata filtering, hybrid search, and chunking strategies to improve result quality. See increase search relevance for details.
- Increase throughput: Import from object storage, upsert in batches, use parallel operations, and leverage Python SDK optimizations like gRPC. See increase throughput for details.
- Decrease latency: Use namespaces, filter by metadata, target indexes by host, reuse connections, and deploy in the same cloud region as your index. See decrease latency for details.
Backup up your indexes
In order to enable long-term retention, compliance archiving, and deployment of new indexes, consider backing up your production indexes by creating a backup or collection.Implement error handling
Prepare your application to handle errors gracefully:- Implement error handling and retry logic with exponential backoff
- Handle different error types appropriately (4xx vs 5xx)
- Monitor error rates and set up alerts
- Check status.pinecone.io before escalating issues