This page shows you how to build a simple RAG chatbot in Python using Pinecone for the vector database, OpenAI for the embedding model and LLM, and LangChain for the RAG workflow.

To run through this guide in your browser, use the “Build a RAG chatbot” colab notebook.

How it works

GenAI chatbots built on Large Language Models (LLMs) can answer many questions. However, when the questions concern private data that the LLMs have not been trained on, you can get answers that sound convincing but are factually wrong. This behavior is referred to as “hallucination”.

Retrieval augmented generation (RAG) is a framework that prevents hallucination by providing LLMs the knowledge that they are missing, based on private data stored in a vector database like Pinecone.

RAG overview

Before you begin

Ensure you have the following:

1. Set up your environment

  1. Install the LangChain libraries required for this notebook:

    pip install \
        langchain-pinecone \
        langchain-openai \
        langchain-text-splitters \
        langchain
    
  2. Set environment variables for your Pinecone and OpenAI API keys:

    export PINECONE_API_KEY="<your Pinecone API key>" # available at app.pinecone.io
    export OPENAI_API_KEY="<your OpenAI API key>" # available at platform.openai.com/api-keys
    

2. Store knowledge in Pinecone

For this guide, you’ll use a document about a fictional product called the WonderVector5000 that LLMs do not have any information about. You’ll use LangChain to chunk the document into smaller segments, convert each segment into vectors using OpenAI, and then upsert your vectors into your Pinecone index.

  1. Since your document is in Markdown, chunk the content based on structure to get semantically coherent segments. In this case, headers_to_split_on specifies h2 headers as the indicators of where to split.

    Python
    from langchain_text_splitters import MarkdownHeaderTextSplitter
    
    markdown_document = "## Introduction\n\nWelcome to the whimsical world of the WonderVector5000, an astonishing leap into the realms of imaginative technology. This extraordinary device, borne of creative fancy, promises to revolutionize absolutely nothing while dazzling you with its fantastical features. Whether you're a seasoned technophile or just someone looking for a bit of fun, the WonderVector5000 is sure to leave you amused and bemused in equal measure. Let's explore the incredible, albeit entirely fictitious, specifications, setup process, and troubleshooting tips for this marvel of modern nonsense.\n\n## Product overview\n\nThe WonderVector5000 is packed with features that defy logic and physics, each designed to sound impressive while maintaining a delightful air of absurdity:\n\n- Quantum Flibberflabber Engine: The heart of the WonderVector5000, this engine operates on principles of quantum flibberflabber, a phenomenon as mysterious as it is meaningless. It's said to harness the power of improbability to function seamlessly across multiple dimensions.\n\n- Hyperbolic Singularity Matrix: This component compresses infinite possibilities into a singular hyperbolic state, allowing the device to predict outcomes with 0% accuracy, ensuring every use is a new adventure.\n\n- Aetherial Flux Capacitor: Drawing energy from the fictional aether, this flux capacitor provides unlimited power by tapping into the boundless reserves of imaginary energy fields.\n\n- Multi-Dimensional Holo-Interface: Interact with the WonderVector5000 through its holographic interface that projects controls and information in three-and-a-half dimensions, creating a user experience that's simultaneously futuristic and perplexing.\n\n- Neural Fandango Synchronizer: This advanced feature connects directly to the user's brain waves, converting your deepest thoughts into tangible actions—albeit with results that are whimsically unpredictable.\n\n- Chrono-Distortion Field: Manipulate time itself with the WonderVector5000's chrono-distortion field, allowing you to experience moments before they occur or revisit them in a state of temporal flux.\n\n## Use cases\n\nWhile the WonderVector5000 is fundamentally a device of fiction and fun, let's imagine some scenarios where it could hypothetically be applied:\n\n- Time Travel Adventures: Use the Chrono-Distortion Field to visit key moments in history or glimpse into the future. While actual temporal manipulation is impossible, the mere idea sparks endless storytelling possibilities.\n\n- Interdimensional Gaming: Engage with the Multi-Dimensional Holo-Interface for immersive, out-of-this-world gaming experiences. Imagine games that adapt to your thoughts via the Neural Fandango Synchronizer, creating a unique and ever-changing environment.\n\n- Infinite Creativity: Harness the Hyperbolic Singularity Matrix for brainstorming sessions. By compressing infinite possibilities into hyperbolic states, it could theoretically help unlock unprecedented creative ideas.\n\n- Energy Experiments: Explore the concept of limitless power with the Aetherial Flux Capacitor. Though purely fictional, the notion of drawing energy from the aether could inspire innovative thinking in energy research.\n\n## Getting started\n\nSetting up your WonderVector5000 is both simple and absurdly intricate. Follow these steps to unleash the full potential of your new device:\n\n1. Unpack the Device: Remove the WonderVector5000 from its anti-gravitational packaging, ensuring to handle with care to avoid disturbing the delicate balance of its components.\n\n2. Initiate the Quantum Flibberflabber Engine: Locate the translucent lever marked “QFE Start” and pull it gently. You should notice a slight shimmer in the air as the engine engages, indicating that quantum flibberflabber is in effect.\n\n3. Calibrate the Hyperbolic Singularity Matrix: Turn the dials labeled 'Infinity A' and 'Infinity B' until the matrix stabilizes. You'll know it's calibrated correctly when the display shows a single, stable “∞”.\n\n4. Engage the Aetherial Flux Capacitor: Insert the EtherKey into the designated slot and turn it clockwise. A faint humming sound should confirm that the aetherial flux capacitor is active.\n\n5. Activate the Multi-Dimensional Holo-Interface: Press the button resembling a floating question mark to activate the holo-interface. The controls should materialize before your eyes, slightly out of phase with reality.\n\n6. Synchronize the Neural Fandango Synchronizer: Place the neural headband on your forehead and think of the word “Wonder”. The device will sync with your thoughts, a process that should take just a few moments.\n\n7. Set the Chrono-Distortion Field: Use the temporal sliders to adjust the time settings. Recommended presets include “Past”, “Present”, and “Future”, though feel free to explore other, more abstract temporal states.\n\n## Troubleshooting\n\nEven a device as fantastically designed as the WonderVector5000 can encounter problems. Here are some common issues and their solutions:\n\n- Issue: The Quantum Flibberflabber Engine won't start.\n\n    - Solution: Ensure the anti-gravitational packaging has been completely removed. Check for any residual shards of improbability that might be obstructing the engine.\n\n- Issue: The Hyperbolic Singularity Matrix displays “∞∞”.\n\n    - Solution: This indicates a hyper-infinite loop. Reset the dials to zero and then adjust them slowly until the display shows a single, stable infinity symbol.\n\n- Issue: The Aetherial Flux Capacitor isn't engaging.\n\n    - Solution: Verify that the EtherKey is properly inserted and genuine. Counterfeit EtherKeys can often cause malfunctions. Replace with an authenticated EtherKey if necessary.\n\n- Issue: The Multi-Dimensional Holo-Interface shows garbled projections.\n\n    - Solution: Realign the temporal resonators by tapping the holographic screen three times in quick succession. This should stabilize the projections.\n\n- Issue: The Neural Fandango Synchronizer causes headaches.\n\n    - Solution: Ensure the headband is properly positioned and not too tight. Relax and focus on simple, calming thoughts to ease the synchronization process.\n\n- Issue: The Chrono-Distortion Field is stuck in the past.\n\n    - Solution: Increase the temporal flux by 5%. If this fails, perform a hard reset by holding down the “Future” slider for ten seconds."
    
    headers_to_split_on = [
        ("##", "Header 2")
    ]
    
    markdown_splitter = MarkdownHeaderTextSplitter(
        headers_to_split_on=headers_to_split_on, strip_headers=False
    )
    md_header_splits = markdown_splitter.split_text(markdown_document)
    
  2. Initialize a LangChain embedding object:

    This step uses the OpenAI API key you set as an environment variable earlier. OpenAI is a paid service, so running the remainder of this guide may incur some small cost.

    Python
    from langchain_openai import OpenAIEmbeddings
    
    model_name = "text-embedding-3-small"  
    embeddings = OpenAIEmbeddings(  
        model=model_name,  
        openai_api_key=os.environ.get("OPENAI_API_KEY")  
    )  
    
  3. Create a serverless index in Pinecone for storing the embeddings of your document, setting the index dimensions and distance metric to match those of the OpenAI text-embedding-3-small model you’ll use to create the embeddings:

    Python
    from pinecone.grpc import PineconeGRPC as Pinecone
    from pinecone import ServerlessSpec
    import os
    
    pc = Pinecone(api_key=os.environ.get("PINECONE_API_KEY"))
    
    index_name = "docs-rag-chatbot"
    
    if index_name not in pc.list_indexes().names():
        pc.create_index(
            name=index_name,
            dimension=1536, 
            metric="cosine", 
            spec=ServerlessSpec(
                cloud="aws", 
                region="us-east-1"
            ) 
        ) 
    
  4. Embed each chunk and upsert the embeddings into a distinct namespace called wondervector5000. Namespaces let you partition records within an index and are essential for implementing multitenancy when you need to isolate the data of each customer/user.

    Python
    import time
    from langchain_pinecone import PineconeVectorStore
    
    namespace = "wondervector5000"
    
    docsearch = PineconeVectorStore.from_documents(
        documents=md_header_splits,
        index_name=index_name,
        embedding=embeddings, 
        namespace=namespace 
    )
    
    time.sleep(1)
    
  5. Use Pinecone’s list and query operations to look at one of the records:

    Python
    index = pc.Index(index_name)
    
    for ids in index.list(namespace=namespace):
        query = index.query(
            id=ids[0], 
            namespace=namespace, 
            top_k=1,
            include_values=True,
            include_metadata=True
        )
        print(query)
    
    # Response:
    # {'matches': [{'id': '8a7e5227-a738-4422-9c25-9a6136825803',
    #             'metadata': {'Header 2': 'Introduction',
    #                         'text': '## Introduction  \n'
    #                                 'Welcome to the whimsical world of the '
    #                                 'WonderVector5000, an astonishing leap into '
    #                                 'the realms of imaginative technology. This '
    #                                 'extraordinary device, borne of creative '
    #                                 'fancy, promises to revolutionize '
    #                                 'absolutely nothing while dazzling you with '
    #                                 "its fantastical features. Whether you're a "
    #                                 'seasoned technophile or just someone '
    #                                 'looking for a bit of fun, the '
    #                                 'WonderVector5000 is sure to leave you '
    #                                 "amused and bemused in equal measure. Let's "
    #                                 'explore the incredible, albeit entirely '
    #                                 'fictitious, specifications, setup process, '
    #                                 'and troubleshooting tips for this marvel '
    #                                 'of modern nonsense.'},
    #             'score': 1.0080868,
    #             'values': [-0.00798303168,
    #                        0.00551192369,
    #                        -0.00463955849,
    #                        -0.00585730933,
    #                        ...
    #                       ]}],
    # 'namespace': 'wondervector5000',
    # 'usage': {'read_units': 6}}    
    

3. Use the chatbot

Now that your document is stored as embeddings in Pinecone, when you send questions to the LLM, you can add relevant knowledge from your Pinecone index to ensure that the LLM returns an accurate response.

  1. Initialize a LangChain object for chatting with the gpt-3.5-turbo LLM and for including relevant context from Pinecone:

    Python
    from langchain_openai import ChatOpenAI
    from langchain.chains import RetrievalQA  
    
    llm = ChatOpenAI(
        openai_api_key=os.environ.get("OPENAI_API_KEY"),
        model_name="gpt-3.5-turbo",
        temperature=0.0
    )
    
    qa = RetrievalQA.from_chain_type(
        llm=llm,
        chain_type="stuff",
        retriever=docsearch.as_retriever()
    )
    
  2. Define a few questions about the WonderVector5000. These questions require specific, private knowledge of the product, which the LLM does not have by default.

    Python
    query1 = "What are the first 3 steps for getting started with the WonderVector5000?"
    
    query2 = "The Neural Fandango Synchronizer is giving me a headache. What do I do?"
    
  3. Send query1 to the LLM twice, first with relevant knowledge from Pincone and then without any additional knowledge:

    Python
    query1_with_knowledge = qa.invoke(query1)
    query1_without_knowledge = llm.invoke(query1)
    
    print(query1_with_knowledge)
    print()
    print(query1_without_knowledge)
    
    # Response:
    # {'query': 'What are the first 3 steps for getting started with the WonderVector5000?', 'result': "The first 3 steps for getting started with the WonderVector5000 are:\n\n1. Unpack the Device: Remove the WonderVector5000 from its anti-gravitational packaging with care.\n2. Initiate the Quantum Flibberflabber Engine: Pull the translucent lever marked “QFE Start” gently to engage the engine.\n3. Calibrate the Hyperbolic Singularity Matrix: Turn the dials labeled 'Infinity A' and 'Infinity B' until the matrix stabilizes with a single, stable “∞” display."}
    #
    # content='1. Unbox the WonderVector5000 and carefully read the user manual provided. Familiarize yourself with the different components of the device and understand their functions.\n\n2. Charge the WonderVector5000 using the provided charging cable. Make sure the device is fully charged before using it for the first time to ensure optimal performance.\n\n3. Turn on the WonderVector5000 by pressing the power button. Follow the on-screen instructions to set up the device and customize the settings according to your preferences.' response_metadata={'token_usage': {'completion_tokens': 100, 'prompt_tokens': 24, 'total_tokens': 124}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None} id='run-e782f1a1-3c1a-436f-bfb7-ca39552d8761-0'    
    

    Notice that the first response provides very accurate getting started steps, matching closely the information in the WonderVector5000 document, while the second response sounds convincing but is generic and inaccurate.

  4. Now repeat the process with query2:

    Python
    query2_with_knowledge = qa.invoke(query2)
    query2_without_knowledge = llm.invoke(query2)
    
    print(query2_with_knowledge)
    print()
    print(query2_without_knowledge)
    
    # Response:
    # {'query': 'The Neural Fandango Synchronizer is giving me a headache. What do I do?', 'result': 'If the Neural Fandango Synchronizer is causing headaches, you should ensure that the headband is properly positioned and not too tight. Additionally, try to relax and focus on simple, calming thoughts to ease the synchronization process. If the issue persists, you may need to take a break from using the device and consult the user manual for further guidance.'}
    #
    # content='If the Neural Fandango Synchronizer is giving you a headache, it is important to stop using it immediately and give yourself a break. Here are some steps you can take to alleviate the headache:\n\n1. Take a break and rest: Give yourself some time to relax and rest. Close your eyes, take deep breaths, and try to relax your mind and body.\n\n2. Hydrate: Drink plenty of water to stay hydrated, as dehydration can sometimes contribute to headaches.\n\n3. Use a cold compress: Applying a cold compress to your forehead or the back of your neck can help alleviate the headache.\n\n4. Take over-the-counter pain medication: If the headache persists, you can take over-the-counter pain medication such as ibuprofen or acetaminophen to help relieve the pain.\n\n5. Avoid using the Neural Fandango Synchronizer: If the headache is directly related to using the device, it is best to avoid using it until you feel better.\n\nIf the headache persists or worsens, it is important to seek medical attention from a healthcare professional. They can provide further guidance and treatment options to help alleviate your headache.' response_metadata={'token_usage': {'completion_tokens': 232, 'prompt_tokens': 26, 'total_tokens': 258}, 'model_name': 'gpt-3.5-turbo', 'system_fingerprint': None, 'finish_reason': 'stop', 'logprobs': None} id='run-b35d47a3-469a-4adb-8497-a68b8539839b-0'
    

    Again, notice how the first response provides very accurate troubleshooting guidance, matching closely the information in the WonderVector5000 document, while the second response sounds convincing but is generic and inaccurate.

Next steps

Was this page helpful?