Pinecone Assistant architecture
This page describes the architecture for Pinecone Assistant.
Overview
Pinecone Assistant runs as a managed service on the Pinecone platform. It uses a combination of machine learning models and information retrieval techniques to provide responses that are informed by your documents. The assistant is designed to be easy to use, requiring minimal setup and no machine learning expertise.
Pinecone Assistant simplifies complex tasks like data chunking, vector search, embedding, and querying while ensuring privacy and security.
Data ingestion
When a document is uploaded, the assistant processes the content by chunking it into smaller parts and generating vector embeddings for each chunk. These embeddings are stored in an index, making them ready for retrieval.
Data retrieval
During a chat, the assistant processes the message to formulate relevant search queries, which are used to query the index and identify the most relevant chunks from the uploaded content.
Response generation
After retrieving these chunks, the assistant performs a ranking step to determine which information is most relevant. This context, along with the chat history and assistant instructions, is then used by a large language model (LLM) to generate responses that are informed by your documents.
Was this page helpful?