Test Pinecone at scale

This guide walks you through testing Pinecone at production scale. You’ll import 10 million vectors, run a benchmark, and analyze the results to verify Pinecone meets production requirements for semantic search applications.

This test requires a Pinecone account on the Standard or Enterprise plan. New users can sign up for the Standard trial for 21 days and $300 in credits, more than enough to cover the costs of this test. Existing users on the Starter plan can upgrade.

About this test

Semantic search enables finding relevant content based on meaning rather than exact keyword matches, making it ideal for applications like product search, content recommendation, and question-answering systems. This test simulates a production-scale semantic search workload, measuring import time, query throughput, query latency, and associated costs. The test uses the following configuration:

Records: 10 million records from the Amazon Reviews 2023 dataset
Embedding model: llama-text-embed-v2 (1024 dimensions)
Similarity metric: cosine
Total size: 48.8 GB
Query load: 10 queries per second total (across all users)
Concurrent users: 10 users querying simultaneously
Test queries: 100,000 queries
Import time target: < 30 minutes
Query latency target: p90 latency < 100ms

Estimated cost: ~$127 (import: $48.80, queries: $78.08, storage: $0.09) — see detailed cost breakdown

1. Get an API key

To follow the steps in this guide, you’ll need an API key. Create a new API key in the Pinecone console, or use this widget:

Your generated API key:

"{{YOUR_API_KEY}}"

2. Create an index

This test requires you to use AWS-based indexes and infrastructure. The sample dataset is only available from Amazon S3, and you can only import from Amazon S3 to Pinecone indexes hosted on AWS. To run the benchmark, you’ll need to provision an AWS EC2 instance in the same region as your index.

Create an on-demand index that matches the dimensions and similarity metric of the dataset you’ll import in later steps.

Console
Code

In the Pinecone console, go to the Indexes page.
Click Create index.
Check Custom settings.
Configure the index with the following settings:
- Name: search-10m
- Vector type: Dense
- Dimensions: 1024
- Metric: cosine
- Capacity mode: Serverless (on-demand)
- Cloud: AWS (required for this test)
- Region: Use an AWS region appropriate for your use case (for example, us-east-1)
Click Create index.

If using code to create an index, first install the Python SDK:

Terminal

pip install pinecone

Then, create the index:

from pinecone import Pinecone, ServerlessSpec

pc = Pinecone(api_key="{{YOUR_API_KEY}}")

index_name = "search-10m"

if not pc.has_index(index_name):
    pc.create_index(
        name=index_name,
        vector_type="dense",
        dimension=1024,
        metric="cosine",
        spec=ServerlessSpec(
            # AWS is required for this test
            cloud="aws",   
            # Use an AWS region appropriate for your use case
            region="us-east-1" 
        )
    )

3. Import the dataset

Pinecone’s import feature enables you to load millions of vectors from object storage in parallel. In this step, you’ll import 10 million records into a single namespace (ns_2) in your index.

Choose an import source

To import the dataset, you’ll need to use the following Amazon S3 import URL:

s3://fe-customer-pocs/search/search_10M/dense/

Start and monitor the import

For this dataset, the import should take less than 30 minutes.

Console
Code

In the Pinecone console, go to the Indexes page.
Find your search-10m index and click … > Import data.
For Storage integration, select No integration (public bucket).
Enter the import URL: s3://fe-customer-pocs/search/search_10M/dense/.
For Error handling, select Abort on error (default).
Click Start import.

To monitor progress, open your index in the Pinecone console and navigate to the Imports tab. After the import completes, compare the Started time and End time timestamps to see the total time required.

For this dataset, the import should take around 30 minutes. While the import is running, you can continue with the next step to provision a VM and install VSB. However, wait for the import to complete before running the benchmark.

Start the import:

from pinecone import Pinecone, ImportErrorMode

pc = Pinecone(api_key="{{YOUR_API_KEY}}")
index = pc.Index("search-10m")

import_response = index.start_import(
    uri="s3://fe-customer-pocs/search/search_10M/dense/",
    error_mode=ImportErrorMode.ABORT 
)
print(f"Import started: {import_response['id']}")

Monitor import progress:

from pinecone import Pinecone
import time 

pc = Pinecone(api_key="{{YOUR_API_KEY}}")
index = pc.Index("search-10m")

while True:
    status = index.describe_import(id="IMPORT_ID")
    print(f"Status: {status['status']}, Progress: {status['percent_complete']:.1f}%")
    if status['status'] == "Completed":
        print("Import completed successfully!")
        break
    elif status['status'] == "Failed":
        print("Import failed. Check error details.")
        break
    time.sleep(15) # Check every 15 seconds

There are three ways to find the import ID:

It’s returned when the import is started
In the Pinecone console, on the Imports tab for your index
By calling List imports

4. Run the benchmark

To simulate realistic query patterns and measure latency and throughput for your Pinecone index, use Vector Search Bench (VSB). The benchmark runs 100,000 queries at 10 queries per second, which should take just under three hours to complete.

Provision a VM

VSB reports latency as the time from when the tool issues a query to when the query is returned by Pinecone.To minimize the client-side latency between the tool and Pinecone, run the benchmark on a dedicated AWS EC2 instance that’s hosted in the same AWS region as your Pinecone index. This reduces the client-side latency to sub-millisecond range.

As noted in section 2, this test requires an AWS EC2 instance in the same region as your index.

For instructions on how to provision an EC2 instance, see the AWS documentation.

Create a VM that comes with Python 3.11 or higher.

Connect to the VM

Connect to the VM using SSH or the cloud provider’s console.

Install Vector Search Bench (VSB)

VSB (Vector Search Bench) is a benchmarking suite for testing vector database search performance across different workloads and databases. To install it, you’ll first need to install various dependencies.

Verify Python version VSB requires Python 3.11 or higher to run. Verify your Python version:
Terminal
```
python3 --version
```
If your version is below 3.11, install Python 3.11+ using your distribution’s package manager.
Install git Git is required to clone the VSB repository. Check if git is installed:
Terminal
```
git --version
```
If git is not installed, install it using your system’s package manager:
Terminal
```
# Adapt for your VM's package manager (apt/yum/dnf)
sudo apt-get update && sudo apt-get install git
```
Install pipx pipx is required to install Poetry. First, check if pip3 is installed:
Terminal
```
pip3 --version
```
If pip is not installed, install it using your system’s package manager:
Terminal
```
# Adapt for your VM's package manager (apt/yum/dnf)
sudo apt-get update && sudo apt-get install python3-pip
```
Then check if pipx is installed:
Terminal
```
pipx --version
```
If pipx is not installed, install it via your system’s package manager:
Terminal
```
# Adapt for your VM's package manager (apt/yum/dnf)
sudo apt-get update && sudo apt-get install pipx
pipx ensurepath
```
After installation, run this command to update the PATH in your current terminal session:
Terminal
```
source ~/.bashrc
```
Install Poetry Poetry is required to manage VSB’s Python dependencies and virtual environment. If Poetry is not installed, use pipx to install it:
Terminal
```
pipx install poetry
```
Alternatively, use the official Poetry installer.
Clone the VSB repository To run the benchmark, you’ll first need to clone the VSB repository and navigate to it:
Terminal
```
git clone https://github.com/pinecone-io/VSB.git
cd VSB
```
Configure Poetry Since your VM has Python 3.11 or higher installed (as specified in the VM provisioning step), tell Poetry to use it:
Terminal
```
poetry env use python3
```
Install dependencies VSB requires several Python packages to run. Install all dependencies:
Terminal
```
poetry install
```

Benchmark your Pinecone index

To test the performance of your Pinecone index, run the following command from within the VSB directory. For more information about VSB, see its GitHub repository.The following command simulates 10 concurrent users issuing a total of 100,000 queries at 10 queries per second (QPS). Each query performs a vector search for the top 10 most similar 1024-dimensional vectors, using cosine similarity, with query vectors selected uniformly at random. The --skip_populate flag skips the data population phase, since you’ve already imported data into your index.

Terminal

poetry run vsb \
    --database="pinecone" \
    --workload=synthetic-proportional \
    --pinecone_api_key="{{YOUR_API_KEY}}" \
    --pinecone_index_name="search-10m" \
    --pinecone_namespace_name="ns_2" \
    --synthetic_dimensions=1024 \
    --synthetic_metric=cosine \
    --synthetic_top_k=10 \
    --synthetic_requests=100000 \
    --users=10 \
    --requests_per_sec=10 \
    --synthetic_query_distribution=uniform \
    --synthetic_query_ratio=1 \
    --synthetic_insert_ratio=0 \
    --synthetic_delete_ratio=0 \
    --synthetic_update_ratio=0 \
    --skip_populate

5. Analyze performance

At the end of the run, VSB prints an operation summary including the requests per second achieved and latencies at different percentiles. Here’s an example output:

Terminal

2025-12-23T00:34:37 INFO     Completed Run phase, took 9940.14s

                      Operation Summary

  Operation  Requests  Failures  Requests/sec  Failures/sec
 ───────────────────────────────────────────────────────────
  Search        99000     0(0%)            10           0.0

                                                    Metrics Summary

  Operation  Metric         Min  0.1%    1%    5%   10%   25%   50%   75%   90%   95%   99%  99.9%  99.99%   Max  Mean
 ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  Search     Latency (ms)    23    25    25    26    27    27    29    34    44    81   350    430    1300  4602    43

Confirm that the requests per second achieved is around 10 QPS and the p90 latency is less than 100ms. To see more detailed statistics, you can analyze the stats.json file identified in the output.

6. Check costs

You can check the costs for the import, queries, and storage in the Pinecone console at Settings > Usage. Cost data is delayed up to three days, but once it’s available, compare the actual costs to the estimated costs below.

For the latest pricing details, see Pricing.

Cost type	Amount	Pricing	Estimated cost
Import	48.8 GB	$1/GB	$48.80
Queries	100,000 queries	$16 per 1M read units	$78.08
Storage	4 hours	$0.33/GB/month	$0.09
Total			$126.97

Import costs

The current price for import is $1/GB. The dataset size for this test is 48.8 GB, so the import cost should be $48.80.

Query costs

A query uses 1 read unit (RU) for every 1 GB of namespace size. The current price for queries in the us-east-1 region of AWS is $16 per 1 million read units (pricing varies by region).This test ran 100,000 queries against a namespace size of 48.8 GB. Each query uses 48.8 RUs (1 RU per GB), so the total is 4,880,000 RUs. At $16 per 1 million RUs, the cost is (4,880,000 / 1,000,000) × $16 = $78.08.

Storage costs

The current price for storage is $0.33 per GB per month. The dataset size for this test is 48.8 GB. Assuming a total storage time of 4 hours (including import, benchmark runtime, and cleanup), the storage cost is: $0.33/GB/month * 48.8 GB / 730 hours * 4 hours = $0.09.

Total costs

The total cost for the test is the sum of the import cost, query cost, and storage cost: $48.80 + $78.08 + $0.09 = $126.97.

7. Clean up

When you no longer need your test index, delete it to avoid incurring unnecessary costs.

Get started

Index data

Search

Optimize

Manage data

Manage cost

Move to production

Admin

Operations

Using pods

About this test

1. Get an API key

2. Create an index

3. Import the dataset

Choose an import source

Start and monitor the import

4. Run the benchmark

5. Analyze performance

6. Check costs

7. Clean up

Get started

Index data

Search

Optimize

Manage data

Manage cost

Move to production

Admin

Operations

Using pods

​About this test

​1. Get an API key

​2. Create an index

​3. Import the dataset

​Choose an import source

​Start and monitor the import

​4. Run the benchmark

​5. Analyze performance

​6. Check costs

​7. Clean up

About this test

1. Get an API key

2. Create an index

3. Import the dataset

Choose an import source

Start and monitor the import

4. Run the benchmark

5. Analyze performance

6. Check costs

7. Clean up