Increase throughput
This page describes helpful techniques for increasing throughput for upserts, searches, and other data operations.
Import from object storage
Importing from object storage is the most efficient and cost-effective method to load large numbers of records into an index. You store your data as Parquet files in object storage, integrate your object storage with Pinecone, and then start an asynchronous, long-running operation that imports and indexes your records.
Upsert in batches
Upserting in batches is another efficient way to ingest large numbers of records (up to 1000 per batch). Batch upserting is also a good option if you cannot work around bulk import’s current limitations.
Upsert/search in parallel
Pinecone is thread-safe, so you can send multiple upsert requests and multiple query requests in parallel to help increase throughput.
Python SDK options
Use gRPC
Use the Python SDK with gRPC extras to run data operations such as upserts and queries over gRPC rather than HTTP for a modest performance improvement.
Upsert from a dataframe
To quickly ingest data when using the Python SDK, use the upsert_from_dataframe
method. The method includes retry logic and batch_size
, and is performant especially with Parquet file data sets.
See also
Read more about high-throughput optimizations on our blog.