> ## Documentation Index > Fetch the complete documentation index at: https://docs.pinecone.io/llms.txt > Use this file to discover all available pages before exploring further. # Haystack > Connect Pinecone and Haystack to ship vector search and RAG: embed, index, and query at scale with managed infrastructure. export const PrimarySecondaryCTA = ({primaryLabel, primaryHref, primaryTarget, secondaryLabel, secondaryHref, secondaryTarget}) =>

{primaryLabel && primaryHref &&

} {secondaryLabel && secondaryHref &&

{secondaryLabel}

}

; Haystack is the open source Python framework by Deepset for building custom apps with large language models (LLMs). It lets you quickly try out the latest models in natural language processing (NLP) while being flexible and easy to use. Their community of users and builders has helped shape Haystack into what it is today: a complete framework for building production-ready NLP apps. Haystack and Pinecone integration can be used to keep your NLP-driven apps up-to-date with Haystack's indexing pipelines that help you prepare and maintain your data. ## Setup guide In this guide we will see how to integrate Pinecone and the popular [Haystack library](https://github.com/deepset-ai/haystack) for *Question-Answering*. ### Install Haystack We start by installing the latest version of Haystack with all dependencies required for the `PineconeDocumentStore`. ```Python Python theme={null} pip install -U farm-haystack>=1.3.0 pinecone[grpc] datasets ``` ### Initialize the PineconeDocumentStore We initialize a `PineconeDocumentStore` by providing an API key and environment name. [Create an account](https://app.pinecone.io) to get your free API key. ```Python Python theme={null} from haystack.document_stores import PineconeDocumentStore document_store = PineconeDocumentStore( api_key='', index='haystack-extractive-qa', similarity="cosine", embedding_dim=384 ) ``` ``` INFO - haystack.document_stores.pinecone - Index statistics: name: haystack-extractive-qa, embedding dimensions: 384, record count: 0 ``` ### Prepare data Before adding data to the document store, we must download and convert data into the Document format that Haystack uses. We will use the SQuAD dataset available from Hugging Face Datasets. ```Python Python theme={null} from datasets import load_dataset # load the squad dataset data = load_dataset("squad", split="train") ``` Next, we remove duplicates and unecessary columns. ```Python Python theme={null} # convert to a pandas dataframe df = data.to_pandas() # select only title and context column df = df[["title", "context"]] # drop rows containing duplicate context passages df = df.drop_duplicates(subset="context") df.head() ``` | title | context | | | ----- | --------------------------- | ------------------------------------------------- | | 0 | University\_of\_Notre\_Dame | Architecturally, the school has a Catholic cha... | | 5 | University\_of\_Notre\_Dame | As at most other universities, Notre Dame's st... | | 10 | University\_of\_Notre\_Dame | The university is the major seat of the Congre... | | 15 | University\_of\_Notre\_Dame | The College of Engineering was established in ... | | 20 | University\_of\_Notre\_Dame | All of Notre Dame's undergraduate students are... | Then convert these records into the Document format. ```Python Python theme={null} from haystack import Document docs = [] for d in df.iterrows(): d = d[1] # create haystack document object with text content and doc metadata doc = Document( content=d["context"], meta={ "title": d["title"], 'context': d['context'] } ) docs.append(doc) ``` This `Document` format contains two fields; *'content'* for the text content or paragraphs, and *'meta'* where we can place any additional information that can later be used to apply metadata filtering in our search. Now we upsert the documents to Pinecone. ```Python Python theme={null} # upsert the data document to pinecone index document_store.write_documents(docs) ``` ### Initialize retriever The next step is to create embeddings from these documents. We will use Haystacks `EmbeddingRetriever` with a SentenceTransformer model (`multi-qa-MiniLM-L6-cos-v1`) which has been designed for question-answering. ```Python Python theme={null} from haystack.retriever.dense import EmbeddingRetriever retriever = EmbeddingRetriever( document_store=document_store, embedding_model="multi-qa-MiniLM-L6-cos-v1", model_format="sentence_transformers" ) ``` Then we run the `PineconeDocumentStore.update_embeddings` method with the `retriever` provided as an argument. GPU acceleration can greatly reduce the time required for this step. ```Python Python theme={null} document_store.update_embeddings( retriever, batch_size=16 ) ``` ### Inspect documents and embeddings We can get documents by their ID with the `PineconeDocumentStore.get_documents_by_id` method. ```Python Python theme={null} d = document_store.get_documents_by_id(ids=['49091c797d2236e73fab510b1e9c7f6b'], return_embedding=True)[0] ``` From here we return can view document content with `d.content` and the document embedding with `d.embedding`. ### Initialize an extractive QA pipeline An `ExtractiveQAPipeline` contains three key components by default: * a document store (`PineconeDocumentStore`) * a retriever model * a reader model We use the `deepset/electra-base-squad2` model from the HuggingFace model hub as our reader model. ```Python Python theme={null} from haystack.nodes import FARMReader reader = FARMReader( model_name_or_path='deepset/electra-base-squad2', use_gpu=True ) ``` We are now ready to initialize the `ExtractiveQAPipeline`. ```Python Python theme={null} from haystack.pipelines import ExtractiveQAPipeline pipe = ExtractiveQAPipeline(reader, retriever) ``` ### Ask Questions Using our QA pipeline we can begin querying with `pipe.run`. ```Python Python theme={null} from haystack.utils import print_answers query = "What was Albert Einstein famous for?" # get the answer answer = pipe.run( query=query, params={ "Retriever": {"top_k": 1}, } ) # print the answer(s) print_answers(answer) ``` ``` Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 3.53 Batches/s] Query: What was Albert Einstein famous for? Answers: [ ] ``` ```Python Python theme={null} query = "How much oil is Egypt producing in a day?" # get the answer answer = pipe.run( query=query, params={ "Retriever": {"top_k": 1}, } ) # print the answer(s) print_answers(answer) ``` ``` Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 3.81 Batches/s] Query: How much oil is Egypt producing in a day? Answers: [ ] ``` ```Python Python theme={null} query = "What are the first names of the youtube founders?" # get the answer answer = pipe.run( query=query, params={ "Retriever": {"top_k": 1}, } ) # print the answer(s) print_answers(answer) ``` ``` Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 3.83 Batches/s] Query: What are the first names of the youtube founders? Answers: [ ] ``` We can return multiple answers by setting the `top_k` parameter. ```Python Python theme={null} query = "Who was the first person to step foot on the moon?" # get the answer answer = pipe.run( query=query, params={ "Retriever": {"top_k": 3}, } ) # print the answer(s) print_answers(answer) ``` ``` Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 3.71 Batches/s] Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 3.78 Batches/s] Inferencing Samples: 100%|██████████| 1/1 [00:00<00:00, 3.88 Batches/s] Query: Who was the first person to step foot on the moon? Answers: [ , , ] ```