Pinecone is the leading AI infrastructure for building accurate, secure, and scalable AI applications. Use Pinecone Database to store and search vector data at scale, or start with Pinecone Assistant to get a RAG application running in minutes.

Inference

Leading embedding and reranking models hosted by Pinecone. Explore all models.

Database workflows

Use integrated embedding to upsert and search with text and have Pinecone generate vectors automatically.

1

Create an index

Create an index that is integrated with one of Pinecone’s hosted embedding models. Dense indexes and vectors enable semantic search, while sparse indexes and vectors enable lexical search.

2

Upsert text

Upsert your source text and have Pinecone convert the text to vectors automatically. Use namespaces to partition data for faster queries and multitenant isolation between customers.

3

Search with text

Search the index with a query text. Again, Pinecone uses the index’s integrated model to convert the text to a vector automatically.

4

Improve relevance

Filter by metadata to limit the scope of your search, rerank results to increase search accuracy, or add lexical search to capture both semantic understanding and precise keyword matches.

Assistant workflow

The steps below can be done through the Pinecone console or Pinecone API.

1

Create an assistant

Create an assistant to answer questions about your documents.

2

Upload documents

Upload documents to your assistant. Your assistant manages chunking, embedding, and storage for you.

3

Chat with an assistant

Chat with your assistant and receive responses as a JSON object or as a text stream. For each chat, your assistant queries a large language model (LLM) with context from your documents to ensure the LLM provides grounded responses.

4

Evaluate answers

Evaluate the assistant’s responses for correctness and completeness.

5

Optimize performance

Use custom instructions to tailor your assistant’s behavior and responses to specific use cases or requirements. Filter by metadata associated with files to reduce latency and improve the accuracy of responses.

6

Retrieve context snippets

Retrieve context snippets to understand what relevant data snippets Pinecone Assistant is using to generate responses. You can use the retrieved snippets with your own LLM, RAG application, or agentic workflow.

Start building

Was this page helpful?