This page describes helpful techniques for increasing throughput for upserts, searches, and other data operations.

Import from object storage

Importing from object storage is the most efficient and cost-effective method to load large numbers of records into an index. You store your data as Parquet files in object storage, integrate your object storage with Pinecone, and then start an asynchronous, long-running operation that imports and indexes your records.

Upsert in batches

Upserting in batches is another efficient way to ingest large numbers of records (up to 1000 per batch). Batch upserting is also a good option if you cannot work around bulk import’s current limitations.

Upsert/search in parallel

Pinecone is thread-safe, so you can send multiple upsert requests and multiple query requests in parallel to help increase throughput.

Python SDK options

Use gRPC

Use the Python SDK with gRPC extras to run data operations such as upserts and queries over gRPC rather than HTTP for a modest performance improvement.

Upsert from a dataframe

To quickly ingest data when using the Python SDK, use the upsert_from_dataframe method. The method includes retry logic and batch_size, and is performant especially with Parquet file data sets.

See also

Read more about high-throughput optimizations on our blog.