This test requires a Pinecone account on the Standard or Enterprise plan. New users can sign up for the Standard trial for 21 days and $300 in credits, more than enough to cover the costs of this test. Existing users on the Starter plan can upgrade.
About this test
Semantic search enables finding relevant content based on meaning rather than exact keyword matches, making it ideal for applications like product search, content recommendation, and question-answering systems. This test simulates a production-scale semantic search workload, measuring import time, query throughput, query latency, and associated costs. The test uses the following configuration:- Records: 10 million records from the Amazon Reviews 2023 dataset
- Embedding model:
llama-text-embed-v2(1024 dimensions) - Similarity metric: cosine
- Total size: 48.8 GB
- Query load: 10 queries per second total (across all users)
- Concurrent users: 10 users querying simultaneously
- Test queries: 100,000 queries
- Import time target: < 30 minutes
- Query latency target: p90 latency < 100ms
1. Get an API key
To follow the steps in this guide, you’ll need an API key. Create a new API key in the Pinecone console, or use this widget:2. Create an index
This test requires you to use AWS-based indexes and infrastructure. The sample dataset is only available from Amazon S3, and you can only import from Amazon S3 to Pinecone indexes hosted on AWS. To run the benchmark, you’ll need to provision an AWS EC2 instance in the same region as your index.
- Console
- Code
- In the Pinecone console, go to the Indexes page.
- Click Create index.
- Check Custom settings.
- Configure the index with the following settings:
- Name:
search-10m - Vector type: Dense
- Dimensions:
1024 - Metric: cosine
- Capacity mode: Serverless (on-demand)
- Cloud: AWS (required for this test)
- Region: Use an AWS region appropriate for your use case (for example,
us-east-1)
- Name:
- Click Create index.
3. Import the dataset
Pinecone’s import feature enables you to load millions of vectors from object storage in parallel. In this step, you’ll import 10 million records into a single namespace (ns_2) in your index.
Choose an import source
To import the dataset, you’ll need to use the following Amazon S3 import URL:Start and monitor the import
For this dataset, the import should take less than 30 minutes.- Console
- Code
- In the Pinecone console, go to the Indexes page.
- Find your
search-10mindex and click … > Import data. - For Storage integration, select No integration (public bucket).
- Enter the import URL:
s3://fe-customer-pocs/search/search_10M/dense/. - For Error handling, select Abort on error (default).
- Click Start import.
For this dataset, the import should take around 30 minutes. While the import is running, you can continue with the next step to provision a VM and install VSB. However, wait for the import to complete before running the benchmark.
4. Run the benchmark
To simulate realistic query patterns and measure latency and throughput for your Pinecone index, use Vector Search Bench (VSB). The benchmark runs 100,000 queries at 10 queries per second, which should take just under three hours to complete.1
Provision a VM
VSB reports latency as the time from when the tool issues a query to when the query is returned by Pinecone.To minimize the client-side latency between the tool and Pinecone, run the benchmark on a dedicated AWS EC2 instance that’s hosted in the same AWS region as your Pinecone index. This reduces the client-side latency to sub-millisecond range.For instructions on how to provision an EC2 instance, see the AWS documentation.
As noted in section 2, this test requires an AWS EC2 instance in the same region as your index.
Create a VM that comes with Python 3.11 or higher.
2
Connect to the VM
Connect to the VM using SSH or the cloud provider’s console.
3
Install Vector Search Bench (VSB)
VSB (Vector Search Bench) is a benchmarking suite for testing vector database search performance across different workloads and databases. To install it, you’ll first need to install various dependencies.
-
Verify Python version
VSB requires Python 3.11 or higher to run. Verify your Python version:
If your version is below 3.11, install Python 3.11+ using your distribution’s package manager.Terminal
-
Install git
Git is required to clone the VSB repository. Check if git is installed:
If git is not installed, install it using your system’s package manager:TerminalTerminal
-
Install pipx
pipx is required to install Poetry. First, check if pip3 is installed:
If pip is not installed, install it using your system’s package manager:TerminalThen check if pipx is installed:TerminalIf pipx is not installed, install it via your system’s package manager:TerminalAfter installation, run this command to update the PATH in your current terminal session:TerminalTerminal
-
Install Poetry
Poetry is required to manage VSB’s Python dependencies and virtual environment. If Poetry is not installed, use pipx to install it:
Alternatively, use the official Poetry installer.Terminal
-
Clone the VSB repository
To run the benchmark, you’ll first need to clone the VSB repository and navigate to it:
Terminal
-
Configure Poetry
Since your VM has Python 3.11 or higher installed (as specified in the VM provisioning step), tell Poetry to use it:
Terminal
-
Install dependencies
VSB requires several Python packages to run. Install all dependencies:
Terminal
4
Benchmark your Pinecone index
To test the performance of your Pinecone index, run the following command from within the
VSB directory. For more information about VSB, see its GitHub repository.The following command simulates 10 concurrent users issuing a total of 100,000 queries at 10 queries per second (QPS). Each query performs a vector search for the top 10 most similar 1024-dimensional vectors, using cosine similarity, with query vectors selected uniformly at random. The --skip_populate flag skips the data population phase, since you’ve already imported data into your index.Terminal
5. Analyze performance
At the end of the run, VSB prints an operation summary including the requests per second achieved and latencies at different percentiles. Here’s an example output:Terminal
stats.json file identified in the output.
6. Check costs
You can check the costs for the import, queries, and storage in the Pinecone console at Settings > Usage. Cost data is delayed up to three days, but once it’s available, compare the actual costs to the estimated costs below.For the latest pricing details, see Pricing.
| Cost type | Amount | Pricing | Estimated cost |
|---|---|---|---|
| Import | 48.8 GB | $1/GB | $48.80 |
| Queries | 100,000 queries | $16 per 1M read units | $78.08 |
| Storage | 4 hours | $0.33/GB/month | $0.09 |
| Total | $126.97 |
1
Import costs
The current price for import is $1/GB. The dataset size for this test is 48.8 GB, so the import cost should be $48.80.
2
Query costs
A query uses 1 read unit (RU) for every 1 GB of namespace size. The current price for queries in the
us-east-1 region of AWS is $16 per 1 million read units (pricing varies by region).This test ran 100,000 queries against a namespace size of 48.8 GB. Each query uses 48.8 RUs (1 RU per GB), so the total is 4,880,000 RUs. At $16 per 1 million RUs, the cost is (4,880,000 / 1,000,000) × $16 = $78.08.3
Storage costs
The current price for storage is $0.33 per GB per month. The dataset size for this test is 48.8 GB. Assuming a total storage time of 4 hours (including import, benchmark runtime, and cleanup), the storage cost is: $0.33/GB/month * 48.8 GB / 730 hours * 4 hours = $0.09.
4
Total costs
The total cost for the test is the sum of the import cost, query cost, and storage cost: $48.80 + $78.08 + $0.09 = $126.97.