This page shows you how to use integrated inference to upsert and search without extra steps for embedding data and reranking results.

This feature is in public preview.

1. Install dependencies

Install the latest Pinecone Python SDK and integrated inference plugin as follows:

pip install --upgrade pinecone pinecone-plugin-records

The pinecone-plugin-records plugin is not currently compatible with the pinecone[grpc] version of the Python SDK.

2. Create or configure an index

Integrated inference requires a serverless index configured for a specific embedding model. You can either create a new index for a model or configure an existing index for a model.

To create a serverless index with integrated embedding, use the create_for_model operation as follows:

  • Provide a name for the index.
  • Set embed.model to one of Pinecone’s hosted embedding models.
  • Set spec.cloud and spec.region to the cloud and region where the index should be deployed.
  • Set embed.field_map to the name of the field in your source document that contains the data for embedding.

Other parameters are optional. See the API reference for details.

The response will look like this:

3. Upsert data

Once you have an index configured for a specific embedding model, use the /records/upsert operation to convert your source data to embeddings and upsert them into a namespace in the index.

Note the following requirements for each document in the request body:

  • Each document must contain a unique _id, which will serve as the unique record identifier in the index namespace.
  • Each document must contain a field with the data for embedding. This field must match the field_map specified when creating the index.
  • Any additional fields in the document will be stored in the index and can be returned in search results or used to filter search results.
  • When using the API directly, documents are specified using the NDJSON format, also known as line-delimited JSON or JSONL, with one document per line. The Python SDK transforms the list of dictionary entries into the correct NDJSON format for you.

4. Search the index

Use the /records/search operation to convert a query to a vector embedding and then search your namespace for the most semantically similar records, along with their similarity scores.

Note the following:

  • The inputs field must be text.
  • The top_k parameter must specify the number of similar records to return.
  • Optionally, you can specify:
    • The fields to return. If not specified, the response will include all fields.
    • A filter to narrow down the search results.
    • rerank parameters to rerank the initial search results based on relevance to the query.

In the previous step, you upserted 8 documents, some about Apple, the technology company, and some about apple, the fruit.

First, search for the 4 documents most semantically related to the query, “Disease prevention”:

Notice that the response includes only documents about the fruit, not the tech company:

Search with reranking

To rerank initial search results based on relevance to the query, add the rerank parameter, including the reranking model you want to use, the number of reranked results to return, and the fields to use for reranking, if different than the main query.

For example, repeat the search for the 4 documents most semantically related to the query, “Disease prevention”, but this time rerank the results and return only the 2 most relevant documents:

Notice that the 2 returned documents are the most relevant for the query, the first relating to reducing chronic diseases, the second relating to preventing diabetes:

Search with filtering

Your upserted documents also contain a category field. Now use that field as a filter to search for the 2 documents related to Apple, the tech company, that are in the “product” category:

Notice that the response includes only documents about Apple, the tech company, that are in the “product” category: