Before you can chat with the assistant, you need to upload files. The files provide your assistant with context and information to reference when generating responses.

Supported size and types

The maximum file size is 100MB for PDFs and 10MB for all other file types.

Pinecone Assistant supports the following file types:

  • DOCX (.docx)
  • JSON (.json)
  • Markdown (.md)
  • PDF (.pdf)
  • Text (.txt)

Scanned PDFs and text extraction from images (OCR) are not supported.

If a document contains images, the images are not processed, and the assistant generates responses based on the text content only.

File storage

Files are uploaded to Google Cloud Storage (us-central1 region) and to your organization’s Pinecone vector database. The assistant processes the files, so data is not sent outside of blob storage or Pinecone. A signed URL for the file is generated and stored in the assistant’s details, so the assistant can retrieve the file when generating responses. To view the signed URL, you can list the files in the assistant.

File metadata

You can upload a file with metadata, which allows you to store additional information about the file as key-value pairs.

File metadata can be set only when the file is uploaded. You cannot update metadata after the file is uploaded.

File metadata can be used for the following purposes:

  • Filtering chat responses: Specify filters on assistant responses so only files that match the metadata filter are referenced in the response. Chat requests without metadata filters do not consider metadata.
  • Viewing a filtered list of files: Use metadata filters to list files in an assistant that match specific criteria.

Supported metadata size and types

Pinecone Assistant supports 40KB of metadata per file.

Metadata payloads must be key-value pairs in a JSON object. Keys must be strings, and values can be one of the following data types:

  • String
  • Number (integer or floating point, gets converted to a 64 bit floating point)
  • Booleans (true, false)
  • List of strings

Null metadata values are not supported. Instead of setting a key to hold a
null value, we recommend you remove that key from the metadata payload.

For example, the following would be valid metadata payloads:

JSON
{
    "genre": "action",
    "year": 2020,
    "length_hrs": 1.5
}

{
    "color": "blue",
    "fit": "straight",
    "price": 29.99,
    "is_jeans": true
}

Metadata query language

Pinecone’s filtering query language is based on MongoDB’s query and projection operators. Pinecone currently supports a subset of those selectors:

FilterDescriptionSupported types
$eqMatches with metadata values that are equal to a specified value. Example: {"genre": {"$eq": "documentary"}}Number, string, boolean
$neMatches with metadata values that are not equal to a specified value. Example: {"genre": {"$ne": "drama"}}Number, string, boolean
$gtMatches with metadata values that are greater than a specified value. Example: {"year": {"$gt": 2019}}Number
$gteMatches with metadata values that are greater than or equal to a specified value. Example:{"year": {"$gte": 2020}}Number
$ltMatches with metadata values that are less than a specified value. Example: {"year": {"$lt": 2020}}Number
$lteMatches with metadata values that are less than or equal to a specified value. Example: {"year": {"$lte": 2020}}Number
$inMatches with metadata values that are in a specified array. Example: {"genre": {"$in": ["comedy", "documentary"]}}String, number
$ninMatches with metadata values that are not in a specified array. Example: {"genre": {"$nin": ["comedy", "documentary"]}}String, number
$existsMatches with the specified metadata field. Example: {"genre": {"$exists": true}}Boolean
$andJoins query clauses with a logical AND. Example: {"$and": [{"genre": {"$eq": "drama"}}, {"year": {"$gte": 2020}}]}-
$orJoins query clauses with a logical OR. Example: {"$or": [{"genre": {"$eq": "drama"}}, {"year": {"$gte": 2020}}]}-

For example, the following has a "genre" metadata field with a list of strings:

JSON
{ "genre": ["comedy", "documentary"] }

This means "genre" takes on both values, and requests with the following filters will match:

JSON
{"genre":"comedy"}

{"genre": {"$in":["documentary","action"]}}

{"$and": [{"genre": "comedy"}, {"genre":"documentary"}]}

However, requests with the following filter will not match:

JSON
{ "$and": [{ "genre": "comedy" }, { "genre": "drama" }] }

Additionally, requests with the following filters will not match because they are invalid. They will result in a compilation error:

# INVALID QUERY:
{"genre": ["comedy", "documentary"]}
# INVALID QUERY:
{"genre": {"$eq": ["comedy", "documentary"]}}

Limitations