This page shows you how to build a simple RAG chatbot in Python using Pinecone for the vector database and embedding model, OpenAI for the LLM, and LangChain for the RAG workflow.

To run through this guide in your browser, use the “Build a RAG chatbot” colab notebook.

How it works

GenAI chatbots built on Large Language Models (LLMs) can answer many questions. However, when the questions concern private data that the LLMs have not been trained on, you can get answers that sound convincing but are factually wrong. This behavior is referred to as “hallucination”.

Retrieval augmented generation (RAG) is a framework that prevents hallucination by providing LLMs the knowledge that they are missing, based on private data stored in a vector database like Pinecone.

RAG overview

Before you begin

Ensure you have the following:

1. Set up your environment

  1. Install the Pinecone and LangChain libraries required for this guide:

    pip install \
        "pinecone-client[grpc]" \
        "langchain-pinecone" \
        "langchain-openai" \
        "langchain-text-splitters" \
        "langchain"
    
  2. Set environment variables for your Pinecone and OpenAI API keys:

    export PINECONE_API_KEY="<your Pinecone API key>" # available at app.pinecone.io
    export OPENAI_API_KEY="<your OpenAI API key>" # available at platform.openai.com/api-keys
    

2. Store knowledge in Pinecone

For this guide, you’ll use a document about a fictional product called the WonderVector5000 that LLMs do not have any information about. First, you’ll create a Pinecone index. Then you’ll use LangChain to chunk the document into smaller segments, create vector embeddings for each segment via Pinecone Inference, and upsert the vector embeddings into your Pinecone index.

Pinecone Inference is an API service that gives you access to embedding models hosted on Pinecone’s infrastructure. You can use the Inference API directly or through Langchain’s PineconeEmbeddings class, as shown in this guide.

  1. Create a serverless index in Pinecone for storing the embeddings of your document, setting the index dimensions and distance metric to match those of the multilingual-e5-large model you’ll use to create the embeddings:

    Python
    from pinecone.grpc import PineconeGRPC as Pinecone
    from pinecone import ServerlessSpec
    import os
    
    pc = Pinecone(api_key=os.environ.get("PINECONE_API_KEY"))
    
    if index_name not in pc.list_indexes().names():
        pc.create_index(
            name="docs-rag-chatbot",
            dimension=1024, 
            metric="cosine", 
            spec=ServerlessSpec(
                cloud="aws", 
                region="us-east-1"
            ) 
        ) 
    
  2. Since your document is in Markdown, chunk the content based on structure to get semantically coherent segments. Then use Pinecone Inference to embed each chunk and upsert the embeddings into your Pinecone index.

    Python
    from langchain_pinecone import PineconeEmbeddings
    from langchain_pinecone import PineconeVectorStore
    from langchain_text_splitters import MarkdownHeaderTextSplitter
    import os
    import time
    
    # Chunk the document based on h2 headers.
    markdown_document = "## Introduction\n\nWelcome to the whimsical world of the WonderVector5000, an astonishing leap into the realms of imaginative technology. This extraordinary device, borne of creative fancy, promises to revolutionize absolutely nothing while dazzling you with its fantastical features. Whether you're a seasoned technophile or just someone looking for a bit of fun, the WonderVector5000 is sure to leave you amused and bemused in equal measure. Let's explore the incredible, albeit entirely fictitious, specifications, setup process, and troubleshooting tips for this marvel of modern nonsense.\n\n## Product overview\n\nThe WonderVector5000 is packed with features that defy logic and physics, each designed to sound impressive while maintaining a delightful air of absurdity:\n\n- Quantum Flibberflabber Engine: The heart of the WonderVector5000, this engine operates on principles of quantum flibberflabber, a phenomenon as mysterious as it is meaningless. It's said to harness the power of improbability to function seamlessly across multiple dimensions.\n\n- Hyperbolic Singularity Matrix: This component compresses infinite possibilities into a singular hyperbolic state, allowing the device to predict outcomes with 0% accuracy, ensuring every use is a new adventure.\n\n- Aetherial Flux Capacitor: Drawing energy from the fictional aether, this flux capacitor provides unlimited power by tapping into the boundless reserves of imaginary energy fields.\n\n- Multi-Dimensional Holo-Interface: Interact with the WonderVector5000 through its holographic interface that projects controls and information in three-and-a-half dimensions, creating a user experience that's simultaneously futuristic and perplexing.\n\n- Neural Fandango Synchronizer: This advanced feature connects directly to the user's brain waves, converting your deepest thoughts into tangible actions—albeit with results that are whimsically unpredictable.\n\n- Chrono-Distortion Field: Manipulate time itself with the WonderVector5000's chrono-distortion field, allowing you to experience moments before they occur or revisit them in a state of temporal flux.\n\n## Use cases\n\nWhile the WonderVector5000 is fundamentally a device of fiction and fun, let's imagine some scenarios where it could hypothetically be applied:\n\n- Time Travel Adventures: Use the Chrono-Distortion Field to visit key moments in history or glimpse into the future. While actual temporal manipulation is impossible, the mere idea sparks endless storytelling possibilities.\n\n- Interdimensional Gaming: Engage with the Multi-Dimensional Holo-Interface for immersive, out-of-this-world gaming experiences. Imagine games that adapt to your thoughts via the Neural Fandango Synchronizer, creating a unique and ever-changing environment.\n\n- Infinite Creativity: Harness the Hyperbolic Singularity Matrix for brainstorming sessions. By compressing infinite possibilities into hyperbolic states, it could theoretically help unlock unprecedented creative ideas.\n\n- Energy Experiments: Explore the concept of limitless power with the Aetherial Flux Capacitor. Though purely fictional, the notion of drawing energy from the aether could inspire innovative thinking in energy research.\n\n## Getting started\n\nSetting up your WonderVector5000 is both simple and absurdly intricate. Follow these steps to unleash the full potential of your new device:\n\n1. Unpack the Device: Remove the WonderVector5000 from its anti-gravitational packaging, ensuring to handle with care to avoid disturbing the delicate balance of its components.\n\n2. Initiate the Quantum Flibberflabber Engine: Locate the translucent lever marked “QFE Start” and pull it gently. You should notice a slight shimmer in the air as the engine engages, indicating that quantum flibberflabber is in effect.\n\n3. Calibrate the Hyperbolic Singularity Matrix: Turn the dials labeled 'Infinity A' and 'Infinity B' until the matrix stabilizes. You'll know it's calibrated correctly when the display shows a single, stable “∞”.\n\n4. Engage the Aetherial Flux Capacitor: Insert the EtherKey into the designated slot and turn it clockwise. A faint humming sound should confirm that the aetherial flux capacitor is active.\n\n5. Activate the Multi-Dimensional Holo-Interface: Press the button resembling a floating question mark to activate the holo-interface. The controls should materialize before your eyes, slightly out of phase with reality.\n\n6. Synchronize the Neural Fandango Synchronizer: Place the neural headband on your forehead and think of the word “Wonder”. The device will sync with your thoughts, a process that should take just a few moments.\n\n7. Set the Chrono-Distortion Field: Use the temporal sliders to adjust the time settings. Recommended presets include “Past”, “Present”, and “Future”, though feel free to explore other, more abstract temporal states.\n\n## Troubleshooting\n\nEven a device as fantastically designed as the WonderVector5000 can encounter problems. Here are some common issues and their solutions:\n\n- Issue: The Quantum Flibberflabber Engine won't start.\n\n    - Solution: Ensure the anti-gravitational packaging has been completely removed. Check for any residual shards of improbability that might be obstructing the engine.\n\n- Issue: The Hyperbolic Singularity Matrix displays “∞∞”.\n\n    - Solution: This indicates a hyper-infinite loop. Reset the dials to zero and then adjust them slowly until the display shows a single, stable infinity symbol.\n\n- Issue: The Aetherial Flux Capacitor isn't engaging.\n\n    - Solution: Verify that the EtherKey is properly inserted and genuine. Counterfeit EtherKeys can often cause malfunctions. Replace with an authenticated EtherKey if necessary.\n\n- Issue: The Multi-Dimensional Holo-Interface shows garbled projections.\n\n    - Solution: Realign the temporal resonators by tapping the holographic screen three times in quick succession. This should stabilize the projections.\n\n- Issue: The Neural Fandango Synchronizer causes headaches.\n\n    - Solution: Ensure the headband is properly positioned and not too tight. Relax and focus on simple, calming thoughts to ease the synchronization process.\n\n- Issue: The Chrono-Distortion Field is stuck in the past.\n\n    - Solution: Increase the temporal flux by 5%. If this fails, perform a hard reset by holding down the “Future” slider for ten seconds."
    
    headers_to_split_on = [
        ("##", "Header 2")
    ]
    
    markdown_splitter = MarkdownHeaderTextSplitter(
        headers_to_split_on=headers_to_split_on, strip_headers=False
    )
    md_header_splits = markdown_splitter.split_text(markdown_document)
    
    # Initialize a LangChain embedding object.
    model_name = "multilingual-e5-large"  
    embeddings = PineconeEmbeddings(  
        model=model_name,  
        pinecone_api_key=os.environ.get("PINECONE_API_KEY")  
    )  
    
    # Embed each chunk and upsert the embeddings into your Pinecone index.
    docsearch = PineconeVectorStore.from_documents(
        documents=md_header_splits,
        index_name="docs-rag-chatbot",
        embedding=embeddings, 
        namespace="wondervector5000" 
    )
    
    time.sleep(1)
    
  3. Use Pinecone’s list and query operations to look at one of the records:

    Python
    from pinecone.grpc import PineconeGRPC as Pinecone
    import os
    
    pc = Pinecone(api_key=os.environ.get("PINECONE_API_KEY"))
    
    index = pc.Index("docs-rag-chatbot")
    namespace = "wondervector5000"
    
    for ids in index.list(namespace=namespace):
        query = index.query(
            id=ids[0], 
            namespace=namespace, 
            top_k=1,
            include_values=True,
            include_metadata=True
        )
        print(query)
    
    # Response:
    # {'matches': [{'id': '8a7e5227-a738-4422-9c25-9a6136825803',
    #             'metadata': {'Header 2': 'Introduction',
    #                         'text': '## Introduction  \n'
    #                                 'Welcome to the whimsical world of the '
    #                                 'WonderVector5000, an astonishing leap into '
    #                                 'the realms of imaginative technology. This '
    #                                 'extraordinary device, borne of creative '
    #                                 'fancy, promises to revolutionize '
    #                                 'absolutely nothing while dazzling you with '
    #                                 "its fantastical features. Whether you're a "
    #                                 'seasoned technophile or just someone '
    #                                 'looking for a bit of fun, the '
    #                                 'WonderVector5000 is sure to leave you '
    #                                 "amused and bemused in equal measure. Let's "
    #                                 'explore the incredible, albeit entirely '
    #                                 'fictitious, specifications, setup process, '
    #                                 'and troubleshooting tips for this marvel '
    #                                 'of modern nonsense.'},
    #             'score': 1.0080868,
    #             'values': [-0.00798303168,
    #                        0.00551192369,
    #                        -0.00463955849,
    #                        -0.00585730933,
    #                        ...
    #                       ]}],
    # 'namespace': 'wondervector5000',
    # 'usage': {'read_units': 6}}    
    

3. Use the chatbot

Now that your document is stored as embeddings in Pinecone, when you send questions to the LLM, you can add relevant knowledge from your Pinecone index to ensure that the LLM returns an accurate response.

Initialize a LangChain object for chatting with the gpt-3.5-turbo LLM, define a few questions about the WonderVector5000, and then send the questions to the LLM, first with relevant knowledge from Pincone and then without any additional knowledge.

The questions require specific, private knowledge of the product, which the LLM does not have by default.

Python
from langchain.chains import RetrievalQA  
from langchain_openai import ChatOpenAI
from langchain_openai import OpenAIEmbeddings
from langchain_pinecone import PineconeVectorStore
import os

# Initialize a LangChain object for chatting with the LLM
# without knowledge from Pinecone.
llm = ChatOpenAI(
    openai_api_key=os.environ.get("OPENAI_API_KEY"),
    model_name="gpt-3.5-turbo",
    temperature=0.0
)

# Initialize a LangChain object for retrieving information from Pinecone.
knowledge = PineconeVectorStore.from_existing_index(
    index_name="docs-rag-chatbot",
    namespace="wondervector5000",
    embedding=OpenAIEmbeddings(openai_api_key=os.environ["OPENAI_API_KEY"])
)

# Initialize a LangChain object for chatting with the LLM
# with knowledge from Pinecone. 
qa = RetrievalQA.from_chain_type(
    llm=llm,
    chain_type="stuff",
    retriever=knowledge.as_retriever()
)

# Define a few questions about the WonderVector5000.
query1 = """What are the first 3 steps for getting started 
with the WonderVector5000?"""

query2 = """The Neural Fandango Synchronizer is giving me a 
headache. What do I do?"""

# Send each query to the LLM twice, first with relevant knowledge from Pincone 
# and then without any additional knowledge.
print("Query 1\n")
print("Chat with knowledge:")
print(qa.invoke(query1).get("result"))
print("\nChat without knowledge:")
print(llm.invoke(query1).content)
print("\nQuery 2\n")
print("Chat with knowledge:")
print(qa.invoke(query2).get("result"))
print("\nChat without knowledge:")
print(llm.invoke(query2).content)


# Response:
#
# Query 1

# Chat with knowledge:
# The first three steps for getting started with the WonderVector5000 are:

# 1. Unpack the Device: Remove the WonderVector5000 from its anti-gravitational packaging.
# 2. Initiate the Quantum Flibberflabber Engine: Locate the translucent lever marked “QFE Start” and pull it gently.
# 3. Calibrate the Hyperbolic Singularity Matrix: Turn the dials labeled 'Infinity A' and 'Infinity B' until the matrix stabilizes.

# Chat without knowledge:
# 1. Unbox the WonderVector5000 and carefully read the user manual provided. Familiarize yourself with the different components of the device and their functions.

# 2. Charge the WonderVector5000 using the provided charging cable. Make sure the device is fully charged before using it for the first time.

# 3. Turn on the WonderVector5000 by pressing the power button. Follow the on-screen instructions to set up the device and connect it to your Wi-Fi network.

# Query 2

# Chat with knowledge:
# Ensure the headband is properly positioned and not too tight. Relax and focus on simple, calming thoughts to ease the synchronization process.

# Chat without knowledge:
# If the Neural Fandango Synchronizer is giving you a headache, it is important to stop using it immediately and give yourself a break. Take some time to rest and relax, drink plenty of water, and consider taking over-the-counter pain medication if needed. If the headache persists or worsens, it may be a good idea to consult a healthcare professional for further advice and guidance. Additionally, it may be helpful to adjust the settings or usage of the Neural Fandango Synchronizer to see if that helps alleviate the headache.

For each query, notice that the first response provides very accurate information, matching closely the information in the WonderVector5000 document, while the second response sounds convincing but is generic and inaccurate.

Next steps