SAMPLE APP
Semantic search
A semantic search app to perform semantic search over PDF documents$ npx create-pinecone-app@latest --template legal-semantic-search
The Legal Semantic Search app demonstrates how to programmatically bootstrap a custom knowledge base based on a Pinecone vector database with arbitrary PDF files included in the codebase.
This app is focused on semantic search over legal documents, but this exact same technique and code can be applied to any content stored locally.
Built with
- Pinecone Serverless
- Voyage Embeddings
- Langchain
- Next.js + tailwind
- Node version 20 or higher
Run the sample app
The fastest way to get started is to use thecreate-pinecone-app
CLI tool to get up and running:Get your API key
You need an API key to make API calls to your Pinecone project:- Open the Pinecone console.
- Select your project.
- Go to API Keys.
- Create an API key.
- Copy your API key.
Get your Voyage AI API key
- Create a new Voyage AI account here.
- Create a new API key.
- Add your billing information to your Voyage AI account here. This is required even to use the free tier.
- Copy your API key.
Create a Pinecone serverless index
Create a Pinecone index for this project. The index should have the following properties:- dimension:
1024
The Voyagevoyage-law-2
embeddings model has 1024 dimensions. - metric:
cosine
- region:
us-east-1
Start the project
Requires Node version 20+Dependency installation
From the project root directory, run the following command..env
with relevant keys.Project structure
In this example we opted to use a standard Next.js application structure.Frontend ClientThe frontend uses Next.js, tailwind and custom React components to power the search experience. It also leverages API routes to make calls to the server to initiate bootstrapping of the Pinecone vector database as a knowledge store, and to fetch relevant document chunks for the UI.Backend ServerThis project uses Next.js API routes to handle file chunking, upsertion, and context provision etc. Learn more about the implementation details below.Simple semantic search
This project uses a basic semantic search architecture that achieves low latency natural language search across all embedded documents. When the app is loaded, it performs background checks to determine if the Pinecone vector database needs to be created and populated.Componentized suggested search interfaceTo make it easier for you to clone this app as a starting point and quickly adopt it to your own purposes, we’ve built the search interface as a component that accepts a list of suggested searches and renders them as a dropdown, helping the user find things:You can define your suggested searches in your parent component:src/components/SearchForm.tsx
. It handles:- Displaying suggested searches
- Allowing the user to search, or clear the input
- Providing visual feedback to the user that the search is in progress
- Creates the Pinecone index specified by the
PINECONE_INDEX
environment variable - Loads metadata from the
docs/db.json
file - Loads all PDFs in the
docs
directory - Merges extracted metadata with documents based on filename
- Splits text into chunks
- Assigns unique IDs to each split and flattens metadata
- Upserts each chunk to the Pinecone vector database, in batches
voyage-law-2
, which is purpose-built for use with legal text. This app includes a small handfull of landmark U.S. cases from Justia.During the bootstrapping phase, the case documents are chunked and passed to Voyage’s embeddings model for embedding:/api/search
route, which also uses
Voyage’s embeddings model to convert the user’s query into query vectors: