LlamaIndex
Using LlamaIndex and Pinecone to build semantic search and RAG applications
LlamaIndex is a framework for connecting data sources to LLMs, with its chief use case being the end-to-end development of retrieval augmented generation (RAG) applications. LlamaIndex provides the essential abstractions to more easily ingest, structure, and access private or domain-specific data in order to inject these safely and reliably into LLMs for more accurate text generation. It’s available in Python and Typescript.
Seamlessly integrate Pinecone vector database with LlamaIndex to build semantic search and RAG applications.
Setup guide
LlamaIndex is a framework for connecting data sources to LLMs, with its chief use case being the end-to-end development of RAG applications. Compared to other similar frameworks, LlamaIndex offers a wide variety of tools for pre- and post-processing your data.
This guide shows you how to use LlamaIndex and Pinecone to both perform traditional semantic search and build a RAG pipeline. Specifically, you will:
- Load, transform, and vectorize sample data with LlamaIndex
- Index and store the vectorized data in Pinecone
- Search the data in Pinecone and use the results to augment an LLM call
- Evaluate the answer you get back from the LLM
This guide demonstrates only one way out of many that you can use LlamaIndex as part of a RAG pipeline. See LlamaIndex’s section on Advanced RAG to learn more about what’s possible.
Set up your environment
Before you begin, install some necessary libraries and set environment variables for your Pinecone and OpenAI API keys:
Also note that all code on this page is run on Python 3.11.
Load the data
In this guide, you will use the canonical HNSW paper by Yuri Malkov (PDF) as your sample dataset. Your first step is to download the PDF from arXiv.org and load it into a LlamaIndex loader called PDF Loader. This Loader is available (along with many more) on the LlamaHub, which is a directory of data loaders.
You can see above that each Document
has a ton of useful information, but depending on which Loader you choose, you may have to clean your data. In this case, you need to remove things like remaining \n
characters and broken, hyphenated words (e.g., alg o-\nrithms
→ algorithms
).
The value-add of using a file loader from LlamaHub is that your PDF is already broken down into LlamaIndex Documents. Along with each Document object comes a customizable metadata dictionary and a hash ID, among other useful artifacts.
Transform the data
Metadata
Now, if you look at one of your cleaned Document objects, you’ll see that the default values in your metadata dictionary are not particularly useful.
To add some metadata that would be more helpful, let’s add author name and the paper’s title. Note that whatever metadata you add to the metadata dictionary will apply to all Nodes, so you want to keep your additions high-level.
LlamaIndex also provides advanced customizations for what metadata the LLM can see vs the embedding, etc.
Ingestion pipeline
The easiest way to turn your data into indexable vectors and put those into Pinecone is to make what’s called an Ingestion Pipeline. Ingestion Pipelines are how you will build a pipeline that will take your list of Documents, parse them into Nodes (or “chunks” in non-LlamaIndex contexts), vectorize each Node’s content, and upsert them into Pinecone.
In the following pipeline, you’ll use one of LlamaIndex’s newer parsers: the SemanticSplitterNodeParser, which uses OpenAI’s ada-002 embedding model to split Documents into semantically coherent Nodes.
This step uses the OpenAI API key you set as an environment variable earlier.
Hold off on running this pipeline; you will modify it below.
Upsert the data
Above, you defined an Ingestion Pipeline. There’s one thing missing, though: a vector database into which you can upsert your transformed data.
LlamaIndex lets you declare a VectorStore and add that right into the pipeline for super easy ingestion. Let’s do that with Pinecone below.
This step uses the Pinecone API key you set as an environment variable earlier.
With your PineconeVectorStore now initialized, you can pop that into your pipeline
and run it.
Now ensure your index is up and running with some Pinecone-native methods like .describe_index_stats()
:
Awesome, your index now has vectors in it. Since you have 46 vectors, you can infer that your SemanticSplitterNodeParser
split your list of Documents into 46 Nodes.
Query the data
To fetch search results from Pinecone itself, you need to make a VectorStoreIndex object and a VectorIndexRetriever object. You can then pass natural language queries to your Pinecone index and receive results.
These search results can now be plugged into any downstream task you want.
One of the most common ways to use vector database search results is as additional context to augment a query sent to an LLM. This workflow is what’s commonly referred to as a RAG application.
Build a RAG app with the data
Building a RAG app with LlamaIndex is very simple.
In theory, you could create a simple Query Engine out of your vector_index
object by calling vector_index.as_query_engine().query(‘some query')
, but then you wouldn’t be able to specify the number of Pinecone search results you’d like to use as context.
To control how many search results your RAG app uses from your Pinecone index, you will instead create your Query Engine using the RetrieverQueryEngine class. This class allows you to pass in the retriever
created above, which you configured to retrieve the top 5 search results.
You can even inspect the context (Nodes) that informed your LLM’s answer using the .source_nodes
attribute. Let’s inspect the first Node:
Evaluate the data
Now that you’ve made a RAG app and queried your LLM, you need to evaluate its response.
With LlamaIndex, there are many ways to evaluate the results your RAG app generates. A great way to get started with evaluation is to confirm (or deny) that your LLM’s responses are relevant, given the context retrieved from your vector database. To do this, you can use LlamaIndex’s RelevancyEvaluator class.
The great thing about this type of evaluation is that there is no need for ground truth data (i.e., labeled datasets to compare answers with).
You can see that there are various attributes you can inspect on your evaluator’s result in order to ascertain what’s going on behind the scenes. To get a quick binary True/False signal as to whether your LLM is producing relevant results given your context, inspect the .passing
attribute.
Let’s see what happens when we send a totally out of scope query through your RAG app. Issue a random query you know your RAG app won’t be able to answer, given what’s in your index:
As expected, when you send an out-of-scope question through your RAG pipeline, your evaluator says the LLM’s answer is not relevant to the retrieved context.
Summary
As you have seen, LlamaIndex is a powerful framework to use when building semantic search and RAG applications – and we have only gotten to the tip of the iceberg! Explore more on your own and let us know how it goes.