Skip to main content
This guide walks you through testing Pinecone at production scale. You’ll import 10 million vectors, run a benchmark, and analyze the results to verify Pinecone meets production requirements for semantic search applications.
This test requires a Pinecone account on the Standard or Enterprise plan. New users can sign up for the Standard trial for 21 days and $300 in credits, more than enough to cover the costs of this test. Existing users on the Starter plan can upgrade.

About this test

Semantic search enables finding relevant content based on meaning rather than exact keyword matches, making it ideal for applications like product search, content recommendation, and question-answering systems. This test simulates a production-scale semantic search workload, measuring import time, query throughput, query latency, and associated costs. The test uses the following configuration:
  • Records: 10 million records from the Amazon Reviews 2023 dataset
  • Embedding model: llama-text-embed-v2 (1024 dimensions)
  • Similarity metric: cosine
  • Total size: 48.8 GB
  • Query load: 10 queries per second total (across all users)
  • Concurrent users: 10 users querying simultaneously
  • Test queries: 100,000 queries
  • Import time target: < 30 minutes
  • Query latency target: p90 latency < 100ms
Estimated cost: ~$127 (import: $48.80, queries: $78.08, storage: $0.09) — see detailed cost breakdown

1. Get an API key

To follow the steps in this guide, you’ll need an API key. Create a new API key in the Pinecone console, or use this widget:
Your generated API key:
"{{YOUR_API_KEY}}"

2. Create an index

This test requires you to use AWS-based indexes and infrastructure. The sample dataset is only available from Amazon S3, and you can only import from Amazon S3 to Pinecone indexes hosted on AWS. To run the benchmark, you’ll need to provision an AWS EC2 instance in the same region as your index.
Create an on-demand index that matches the dimensions and similarity metric of the dataset you’ll import in later steps.
  1. In the Pinecone console, go to the Indexes page.
  2. Click Create index.
  3. Check Custom settings.
  4. Configure the index with the following settings:
    • Name: search-10m
    • Vector type: Dense
    • Dimensions: 1024
    • Metric: cosine
    • Capacity mode: Serverless (on-demand)
    • Cloud: AWS (required for this test)
    • Region: Use an AWS region appropriate for your use case (for example, us-east-1)
  5. Click Create index.

3. Import the dataset

Pinecone’s import feature enables you to load millions of vectors from object storage in parallel. In this step, you’ll import 10 million records into a single namespace (ns_2) in your index.

Choose an import source

To import the dataset, you’ll need to use the following Amazon S3 import URL:
s3://fe-customer-pocs/search/search_10M/dense/

Start and monitor the import

For this dataset, the import should take less than 30 minutes.
  1. In the Pinecone console, go to the Indexes page.
  2. Find your search-10m index and click … > Import data.
  3. For Storage integration, select No integration (public bucket).
  4. Enter the import URL: s3://fe-customer-pocs/search/search_10M/dense/.
  5. For Error handling, select Abort on error (default).
  6. Click Start import.
To monitor progress, open your index in the Pinecone console and navigate to the Imports tab. After the import completes, compare the Started time and End time timestamps to see the total time required.
For this dataset, the import should take around 30 minutes. While the import is running, you can continue with the next step to provision a VM and install VSB. However, wait for the import to complete before running the benchmark.

4. Run the benchmark

To simulate realistic query patterns and measure latency and throughput for your Pinecone index, use Vector Search Bench (VSB). The benchmark runs 100,000 queries at 10 queries per second, which should take just under three hours to complete.
1

Provision a VM

VSB reports latency as the time from when the tool issues a query to when the query is returned by Pinecone.To minimize the client-side latency between the tool and Pinecone, run the benchmark on a dedicated AWS EC2 instance that’s hosted in the same AWS region as your Pinecone index. This reduces the client-side latency to sub-millisecond range.
As noted in section 2, this test requires an AWS EC2 instance in the same region as your index.
For instructions on how to provision an EC2 instance, see the AWS documentation.
Create a VM that comes with Python 3.11 or higher.
2

Connect to the VM

Connect to the VM using SSH or the cloud provider’s console.
3

Install Vector Search Bench (VSB)

VSB (Vector Search Bench) is a benchmarking suite for testing vector database search performance across different workloads and databases. To install it, you’ll first need to install various dependencies.
  1. Verify Python version VSB requires Python 3.11 or higher to run. Verify your Python version:
    Terminal
    python3 --version
    
    If your version is below 3.11, install Python 3.11+ using your distribution’s package manager.
  2. Install git Git is required to clone the VSB repository. Check if git is installed:
    Terminal
    git --version
    
    If git is not installed, install it using your system’s package manager:
    Terminal
    # Adapt for your VM's package manager (apt/yum/dnf)
    sudo apt-get update && sudo apt-get install git
    
  3. Install pipx pipx is required to install Poetry. First, check if pip3 is installed:
    Terminal
    pip3 --version
    
    If pip is not installed, install it using your system’s package manager:
    Terminal
    # Adapt for your VM's package manager (apt/yum/dnf)
    sudo apt-get update && sudo apt-get install python3-pip
    
    Then check if pipx is installed:
    Terminal
    pipx --version
    
    If pipx is not installed, install it via your system’s package manager:
    Terminal
    # Adapt for your VM's package manager (apt/yum/dnf)
    sudo apt-get update && sudo apt-get install pipx
    pipx ensurepath
    
    After installation, run this command to update the PATH in your current terminal session:
    Terminal
    source ~/.bashrc
    
  4. Install Poetry Poetry is required to manage VSB’s Python dependencies and virtual environment. If Poetry is not installed, use pipx to install it:
    Terminal
    pipx install poetry
    
    Alternatively, use the official Poetry installer.
  5. Clone the VSB repository To run the benchmark, you’ll first need to clone the VSB repository and navigate to it:
    Terminal
    git clone https://github.com/pinecone-io/VSB.git
    cd VSB
    
  6. Configure Poetry Since your VM has Python 3.11 or higher installed (as specified in the VM provisioning step), tell Poetry to use it:
    Terminal
    poetry env use python3
    
  7. Install dependencies VSB requires several Python packages to run. Install all dependencies:
    Terminal
    poetry install
    
4

Benchmark your Pinecone index

To test the performance of your Pinecone index, run the following command from within the VSB directory. For more information about VSB, see its GitHub repository.The following command simulates 10 concurrent users issuing a total of 100,000 queries at 10 queries per second (QPS). Each query performs a vector search for the top 10 most similar 1024-dimensional vectors, using cosine similarity, with query vectors selected uniformly at random. The --skip_populate flag skips the data population phase, since you’ve already imported data into your index.
Terminal
poetry run vsb \
    --database="pinecone" \
    --workload=synthetic-proportional \
    --pinecone_api_key="{{YOUR_API_KEY}}" \
    --pinecone_index_name="search-10m" \
    --pinecone_namespace_name="ns_2" \
    --synthetic_dimensions=1024 \
    --synthetic_metric=cosine \
    --synthetic_top_k=10 \
    --synthetic_requests=100000 \
    --users=10 \
    --requests_per_sec=10 \
    --synthetic_query_distribution=uniform \
    --synthetic_query_ratio=1 \
    --synthetic_insert_ratio=0 \
    --synthetic_delete_ratio=0 \
    --synthetic_update_ratio=0 \
    --skip_populate

5. Analyze performance

At the end of the run, VSB prints an operation summary including the requests per second achieved and latencies at different percentiles. Here’s an example output:
Terminal
2025-12-23T00:34:37 INFO     Completed Run phase, took 9940.14s

                      Operation Summary

  Operation  Requests  Failures  Requests/sec  Failures/sec
 ───────────────────────────────────────────────────────────
  Search        99000     0(0%)            10           0.0

                                                    Metrics Summary

  Operation  Metric         Min  0.1%    1%    5%   10%   25%   50%   75%   90%   95%   99%  99.9%  99.99%   Max  Mean
 ──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────
  Search     Latency (ms)    23    25    25    26    27    27    29    34    44    81   350    430    1300  4602    43
Confirm that the requests per second achieved is around 10 QPS and the p90 latency is less than 100ms. To see more detailed statistics, you can analyze the stats.json file identified in the output.

6. Check costs

You can check the costs for the import, queries, and storage in the Pinecone console at Settings > Usage. Cost data is delayed up to three days, but once it’s available, compare the actual costs to the estimated costs below.
For the latest pricing details, see Pricing.
Cost typeAmountPricingEstimated cost
Import48.8 GB$1/GB$48.80
Queries100,000 queries$16 per 1M read units$78.08
Storage4 hours$0.33/GB/month$0.09
Total$126.97
1

Import costs

The current price for import is $1/GB. The dataset size for this test is 48.8 GB, so the import cost should be $48.80.
2

Query costs

A query uses 1 read unit (RU) for every 1 GB of namespace size. The current price for queries in the us-east-1 region of AWS is $16 per 1 million read units (pricing varies by region).This test ran 100,000 queries against a namespace size of 48.8 GB. Each query uses 48.8 RUs (1 RU per GB), so the total is 4,880,000 RUs. At $16 per 1 million RUs, the cost is (4,880,000 / 1,000,000) × $16 = $78.08.
3

Storage costs

The current price for storage is $0.33 per GB per month. The dataset size for this test is 48.8 GB. Assuming a total storage time of 4 hours (including import, benchmark runtime, and cleanup), the storage cost is: $0.33/GB/month * 48.8 GB / 730 hours * 4 hours = $0.09.
4

Total costs

The total cost for the test is the sum of the import cost, query cost, and storage cost: $48.80 + $78.08 + $0.09 = $126.97.

7. Clean up

When you no longer need your test index, delete it to avoid incurring unnecessary costs.