Namespace Notes
Namespace Notes
Chat with your PDF documents using Pinecone, Vercel and OpenAI
$ npx create-pinecone-app@latest --template namespace-notes
Namespace Notes is a simple multi-tenant RAG example. The application allows users to create workspaces, upload documents to Pinecone, and to feed the workspace’s chatbot with custom context. This concept can be used to store anywhere from just a few documents, to many billions of contextual embeddings.
Built with
- Pinecone Serverless
- Vercel AI SDK + OpenAI
- Next.js + tailwind
- Node version 20 or higher
Run the sample app
The fastest way to get started is to use the create-pinecone-app
CLI tool to get up and running:
Get your API key
You need an API key to make API calls to your Pinecone project:
Then copy your generated key:
Alternatively, follow these steps:
- Open the Pinecone console.
- Select your project.
- Go to API Keys.
- Copy your API key.
Create a Pinecone serverless index
Create a Pinecone index for this project. The index should have the following properties:
- dimension:
1536
You can change this as long as you change the default embedding model. - metric:
cosine
- region:
us-east-1
You can create the index in the console, or by following the instructions here.
Start the project
Requires Node version 20+
To start the project, clone the sample-apps repo and navigate to the namespace-notes
directory.
You will need two separate terminal instances, one for running the client and one for the server.
Client setup
From the project root directory, run the following command:
Make sure you have populated the client .env
with relevant keys:
Start the client:
Server setup
From the project root directory, run the following command:
Make sure you have populated the server .env
with relevant keys:
Start the server:
Project structure
In this example we opted to use a simple client/server structure. We seperate the frontend from the backend in this manner in case you’d like to swap either out with a stack of your choice.
Frontend Client
The frontend uses Next.js, tailwind and components from Vercel’s AI SDK to power the chatbot experience. It also leverages API routes to make calls to the server to fetch document references and context for both the UI and chatbot LLM. The client uses local storage to store workspace information.
Backend Server
This project uses Node.js and Express to handle file uploads, validation checks, chunking, upsertion, context provision etc. Learn more about the implementation details below.
Simple multi-tenant RAG methodology
This project uses a basic RAG architecture that achieves multitenancy through the use of namespaces. Files are uploaded to the server where they are chunked, embedded and upserted into Pinecone.
Tenant isolation
We use namespaces as the mechanism to separate context between workspaces. When we add documents, we check for a namespaceId or generate a new id if the workspace is being created.
Chunking
This project uses a basic paragraph chunking approach. We use pdf-parse
to stream and parse pdf content and leverage a best effort paragraph chunking strategy with a defined minChunkSize
and maxChunkSize
to
account for documents with longer or shorter paragraph sizes. This helps us provide sizable content chunks for our Pinecone record metadata which will later be used by the LLM during retreival.
Embedding
Once we have our chunks we embed them in batches using text-embedding-3-small
RAG document management
In order to store multiple documents within a particular namespace we need a convention that allows us to target the chunks belonging to a particular document.
We do this through id prefixing. We generate a document Id for each uploaded document, and then before uposertion we assign it as a prefix to the particular chunk id.
The below example uses the document id with an appended chunk id separated by a ‘:
’ symbol.
This comes in handy for targeted document updates and deletions.
Upsertion
Lastly, we upsert our embeddings to the Pinecone Namespace associated with the tenant in the form of a PineconeRecord
.
This allows us to provide the reference text and url as metadata for use by our retreival system.
Context
When a user asks a question via the frontend chat component, the Vercel AI SDK leverages the /chat
endpoint for retrieval.
We then send the top_k
most similar results back from Pinecone via our context route.
We populate a CONTEXT BLOCK
that is wrapped with system prompt instructions for our chosen LLM to take advantage of in the response output.
It’s important to note that different LLMs will have different context windows, so your choice of LLM will influence the top_k
value you should return from Pinecone and along with the size of your chunks.
If the context block / prompt is longer than the context window of the LLM, it will not be fully included in generation results.
Document deletion
To delete a document from a particular workspace, we need to perform a targeted deletion of the RAG document. Luckily, we can take advantage of the id prefixing strategy we employed earlier to perform a deletion of a specific document.
We use our documentId:
to identify all the chunks associated with a particular document and then we perform deletions until we have successfully deleted all document chunks.
Workspace deletion (offboarding)
This is even simpler to achieve. If we have a the workspace / namespaceId at our disposal, we can simply call deleteAll()
on the relevant namespace.
Further optimizations for the RAG pipeline
This is a relatively simple RAG pipeline - in practice there are improvements that could be made depending on a particular set of requirements.
Using rerankers
For example, a reranker could be used in order to provide the most relevant set of retrieved results from Pinecone to the LLM.
A reranker could allow us to increase the top_k
requested from Pinecone significantly and then constrain the output to a highly relevant set of records ordered by relevance all while abiding by the context length restrictions of the LLM.
Follow our RAG series for more optimizations
Optimizing chunking strategy
This project uses a paragraph chunker, which can provide good results for some use cases. Often, the quality of a chunk will play a significant role in the quality of the retrieval system as a whole.
Learn more about various chunking strategies
Enhancing metadata structure
The metadata in this project consists simply of a reference url to the original content and the particular text snippet. You could extract richer metadata from the PDFs to provide improved context to the LLM. This, of course, assumes a given PDF upload contains additional metadata and that it would be useful (page count, title, author(s), etc).
Read more about vectorizing structured text.
Troubleshooting
Experiencing any issues with the sample app? Submit an issue, create a PR, or post in our community forum!
Was this page helpful?