Import data
This page shows you how to use the start_import
, list_imports
, describe_import
, and cancel_import
endpoints to import data into an index and interact with the import.
To learn how to format data for importing and other concepts related to imports, see Understanding imports. To run through this guide in your browser, use the Bulk import colab notebook.
This feature is in public preview and available only on Standard and Enterprise plans.
Before you import
Before you can import data, ensure you have the following:
-
An ID for your Amazon S3 integration (not needed for importing from a public bucket). The ID is found on the Storage integrations page of the Pinecone console.
-
Data formatted in a Parquet file and uploaded to the Amazon S3 bucket.
-
A serverless index on AWS to import records into.
The index cannot have existing namespaces with the same name as the namespaces defined in your file directory structure.
Import records into an index
Import is only available for serverless indexes on AWS.
Use the start_import
operation to start an asynchronous import of vectors from object storage into an index.
To import from a private bucket, specify the Integration ID (integration
) of the Amazon S3 integration you created. The ID is found on the Storage integrations page of the Pinecone console. An ID is not needed to import from a public bucket.
The operation returns an id
that you can use to check the status of the import.
-
Each import request can import up 1TB of data, or 100,000,000 records into a maximum of 100 namespaces, whichever limit is met first.
-
You cannot import data into existing namespaces. For more information, see Directory structure.
{
"operation_id": "101"
}
Once all the data is loaded, the index builder will index the records, which usually takes at least 10 minutes. During this indexing process, the expected job status is InProgress
, but 100.0
percent complete. Once all the imported records are indexed and fully available for querying, the import operation will be set to Completed
.
You can start a new import using the Pinecone console. Find the index you want to import into, and click the ellipsis (..) menu > Import data.
List recent and ongoing imports
Use the list_imports
operation to list all of the recent and ongoing imports.
Whenever there are additional imports to return, the response includes a pagination_token
for fetching the next page of imports.
{
"data": [
{
"id": "1",
"uri": "s3://BUCKET_NAME/PATH/TO/DIR",
"status": "Pending",
"started_at": "2024-08-19T20:49:00.754Z",
"finished_at": "2024-08-19T20:49:00.754Z",
"percent_complete": 42.2,
"records_imported": 1000000
}
],
"pagination": {
"next": "Tm90aGluZyB0byBzZWUgaGVyZQo="
}
}
You can view the list of imports for an index in the Pinecone console. Select the index and navigate to the Imports tab.
Manual pagination
When using the REST API to list recent and ongoing imports, you must manually fetch each page of results. To view the next page of results, include the paginationToken
provided in the response of the list_imports
/ GET
request.
Describe an import
Use the describe_import
operation to get details about a specific import.
{
"id": "101",
"uri": "s3://BUCKET_NAME/PATH/TO/DIR",
"status": "Pending",
"created_at": "2024-08-19T20:49:00.754Z",
"finished_at": "2024-08-19T20:49:00.754Z",
"percent_complete": 42.2,
"records_imported": 1000000
}
You can view the details of your import using the Pinecone console.
Cancel an ongoing import
The cancel_import
operation cancels an import if it is not yet finished. It has no effect if the import is already complete.
{}
You can cancel your import using the Pinecone console. To cancel an ongoing import, select the index you are importing into and navigate to the Imports tab. Then, click the ellipsis (..) menu > Cancel.
Was this page helpful?