Pinecone Assistant is a service that allow you to build production-grade chat and agent-based applications quickly.
Create an AI assistant that answers complex questions about your proprietary data
Set up a fully managed vector database for high-performance semantic search
Pinecone Assistant is useful for a variety of tasks, especially for the following:
You can use the Assistant API directly, through the Pinecone Python SDK, or through the Pinecone Node.js SDK.
You can use the Pinecone Assistant through the Pinecone console or Pinecone API.
The following steps outline the general Pinecone Assistant workflow:
Create an assistant
Create an assistant to answer questions about your documents.
Upload documents
Upload documents to your assistant. Your assistant manages chunking, embedding, and storage for you.
Chat with an assistant
Chat with your assistant and receive responses as a JSON object or as a text stream. For each chat, your assistant queries a large language model (LLM) with context from your documents to ensure the LLM provides grounded responses.
Evaluate answers
Evaluate the assistant’s responses for correctness and completeness.
Optimize performance
Use custom instructions to tailor your assistant’s behavior and responses to specific use cases or requirements. Filter by metadata associated with files to reduce latency and improve the accuracy of responses.
Retrieve context snippets
Retrieve context snippets to understand what relevant data snippets Pinecone Assistant is using to generate responses. You can use the retrieved snippets with your own LLM, RAG application, or agentic workflow.
For information on how the Pinecone Assistant works, see Assistant architecture.
The following steps outline the general Pinecone Assistant workflow:
Create an assistant
Create an assistant to answer questions about your documents.
Upload documents
Upload documents to your assistant. Your assistant manages chunking, embedding, and storage for you.
Chat with an assistant
Chat with your assistant and receive responses as a JSON object or as a text stream. For each chat, your assistant queries a large language model (LLM) with context from your documents to ensure the LLM provides grounded responses.
Evaluate answers
Evaluate the assistant’s responses for correctness and completeness.
Optimize performance
Use custom instructions to tailor your assistant’s behavior and responses to specific use cases or requirements. Filter by metadata associated with files to reduce latency and improve the accuracy of responses.
Retrieve context snippets
Retrieve context snippets to understand what relevant data snippets Pinecone Assistant is using to generate responses. You can use the retrieved snippets with your own LLM, RAG application, or agentic workflow.
For information on how the Pinecone Assistant works, see Assistant architecture.
The following code samples outline the Pinecone Assistant workflow using either the Pinecone Python SDK and Pinecone Assistant plugin or the Pinecone Node.js SDK.
Comprehensive details about the Pinecone APIs, SDKs, utilities, and architecture.
Four features of the Assistant API you aren’t using - but should
News about features and changes in Pinecone and related tools.