Setup guide
View source Open in Colab LlamaIndex is a framework for connecting data sources to LLMs, with its chief use case being the end-to-end development of RAG applications. Compared to other similar frameworks, LlamaIndex offers a wide variety of tools for pre- and post-processing your data. This guide shows you how to use LlamaIndex and Pinecone to both perform traditional semantic search and build a RAG pipeline. Specifically, you will:- Load, transform, and vectorize sample data with LlamaIndex
- Index and store the vectorized data in Pinecone
- Search the data in Pinecone and use the results to augment an LLM call
- Evaluate the answer you get back from the LLM
This guide demonstrates only one way out of many that you can use LlamaIndex as part of a RAG pipeline. See LlamaIndex’s section on Advanced RAG to learn more about what’s possible.
Set up your environment
Before you begin, install some necessary libraries and set environment variables for your Pinecone and OpenAI API keys:Shell
Shell
Load the data
In this guide, you will use the canonical HNSW paper by Yuri Malkov (PDF) as your sample dataset. Your first step is to download the PDF from arXiv.org and load it into a LlamaIndex loader called PDF Loader. This Loader is available (along with many more) on the LlamaHub, which is a directory of data loaders.Python
Document
has a ton of useful information, but depending on which Loader you choose, you may have to clean your data. In this case, you need to remove things like remaining \n
characters and broken, hyphenated words (e.g., alg o-\nrithms
→ algorithms
).
Python
Transform the data
Metadata
Now, if you look at one of your cleaned Document objects, you’ll see that the default values in your metadata dictionary are not particularly useful.Python
LlamaIndex also provides advanced customizations for what metadata the LLM can see vs the embedding, etc.
Python
Ingestion pipeline
The easiest way to turn your data into indexable vectors and put those into Pinecone is to make what’s called an Ingestion Pipeline. Ingestion Pipelines are how you will build a pipeline that will take your list of Documents, parse them into Nodes (or “chunks” in non-LlamaIndex contexts), vectorize each Node’s content, and upsert them into Pinecone. In the following pipeline, you’ll use one of LlamaIndex’s newer parsers: the SemanticSplitterNodeParser, which uses OpenAI’s ada-002 embedding model to split Documents into semantically coherent Nodes. This step uses the OpenAI API key you set as an environment variable earlier.Python
Upsert the data
Above, you defined an Ingestion Pipeline. There’s one thing missing, though: a vector database into which you can upsert your transformed data. LlamaIndex lets you declare a VectorStore and add that right into the pipeline for super easy ingestion. Let’s do that with Pinecone below. This step uses the Pinecone API key you set as an environment variable earlier.Python
pipeline
and run it.
Python
.describe_index_stats()
:
Python
SemanticSplitterNodeParser
split your list of Documents into 46 Nodes.
Query the data
To fetch search results from Pinecone itself, you need to make a VectorStoreIndex object and a VectorIndexRetriever object. You can then pass natural language queries to your Pinecone index and receive results.Python
Build a RAG app with the data
Building a RAG app with LlamaIndex is very simple. In theory, you could create a simple Query Engine out of yourvector_index
object by calling vector_index.as_query_engine().query(‘some query')
, but then you wouldn’t be able to specify the number of Pinecone search results you’d like to use as context.
To control how many search results your RAG app uses from your Pinecone index, you will instead create your Query Engine using the RetrieverQueryEngine class. This class allows you to pass in the retriever
created above, which you configured to retrieve the top 5 search results.
Python
.source_nodes
attribute. Let’s inspect the first Node:
Python
Evaluate the data
Now that you’ve made a RAG app and queried your LLM, you need to evaluate its response. With LlamaIndex, there are many ways to evaluate the results your RAG app generates. A great way to get started with evaluation is to confirm (or deny) that your LLM’s responses are relevant, given the context retrieved from your vector database. To do this, you can use LlamaIndex’s RelevancyEvaluator class. The great thing about this type of evaluation is that there is no need for ground truth data (i.e., labeled datasets to compare answers with).Python
.passing
attribute.
Let’s see what happens when we send a totally out of scope query through your RAG app. Issue a random query you know your RAG app won’t be able to answer, given what’s in your index:
Python