Setup guide
In this guide we will see how to integrate Pinecone and the popular Haystack library for Question-Answering.Install Haystack
We start by installing the latest version of Haystack with all dependencies required for thePineconeDocumentStore
.
Python
Initialize the PineconeDocumentStore
We initialize aPineconeDocumentStore
by providing an API key and environment name. Create an account to get your free API key.
Python
Prepare data
Before adding data to the document store, we must download and convert data into the Document format that Haystack uses. We will use the SQuAD dataset available from Hugging Face Datasets.Python
Python
title | context | |
---|---|---|
0 | University_of_Notre_Dame | Architecturally, the school has a Catholic cha… |
5 | University_of_Notre_Dame | As at most other universities, Notre Dame’s st… |
10 | University_of_Notre_Dame | The university is the major seat of the Congre… |
15 | University_of_Notre_Dame | The College of Engineering was established in … |
20 | University_of_Notre_Dame | All of Notre Dame’s undergraduate students are… |
Python
Document
format contains two fields; ‘content’ for the text content or paragraphs, and ‘meta’ where we can place any additional information that can later be used to apply metadata filtering in our search.
Now we upsert the documents to Pinecone.
Python
Initialize retriever
The next step is to create embeddings from these documents. We will use HaystacksEmbeddingRetriever
with a SentenceTransformer model (multi-qa-MiniLM-L6-cos-v1
) which has been designed for question-answering.
Python
PineconeDocumentStore.update_embeddings
method with the retriever
provided as an argument. GPU acceleration can greatly reduce the time required for this step.
Python
Inspect documents and embeddings
We can get documents by their ID with thePineconeDocumentStore.get_documents_by_id
method.
Python
d.content
and the document embedding with d.embedding
.
Initialize an extractive QA pipeline
AnExtractiveQAPipeline
contains three key components by default:
- a document store (
PineconeDocumentStore
) - a retriever model
- a reader model
deepset/electra-base-squad2
model from the HuggingFace model hub as our reader model.
Python
ExtractiveQAPipeline
.
Python
Ask Questions
Using our QA pipeline we can begin querying withpipe.run
.
Python
Python
Python
top_k
parameter.
Python