Before you can chat with the assistant, you need to upload files. The files provide your assistant with context and information to reference when generating responses. Files are not shared across assistants.

Supported size and types

The maximum file size is 100MB for PDFs and 10MB for all other file types. When uploading files to an assistant, it is recommended to upload file sizes of 1MB or less for faster processing.

Pinecone Assistant supports the following file types:

  • DOCX (.docx)
  • JSON (.json)
  • Markdown (.md)
  • PDF (.pdf)
  • Text (.txt)

Scanned PDFs and text extraction from images (OCR) are not supported. If a document contains images, the images are not processed, and the assistant generates responses based on the text content only.

File storage

Files are uploaded to Google Cloud Storage (us-central1 region) and to your organization’s Pinecone vector database. The assistant processes the files, so data is not sent outside of blob storage or Pinecone. A signed URL for the file is generated and stored in the assistant’s details, so the assistant can retrieve the file when generating responses. To view the signed URL, you can list the files in the assistant.

File metadata

You can upload a file with metadata, which allows you to store additional information about the file as key-value pairs.

File metadata can be set only when the file is uploaded. You cannot update metadata after the file is uploaded.

File metadata can be used for the following purposes:

  • Filtering chat responses: Specify filters on assistant responses so only files that match the metadata filter are referenced in the response. Chat requests without metadata filters do not consider metadata.
  • Viewing a filtered list of files: Use metadata filters to list files in an assistant that match specific criteria.

Supported metadata size and format

Pinecone Assistant supports 1KB of metadata per file.

  • Metadata fields must be key-value pairs in a flat JSON object. Nested JSON objects are not supported.
  • Keys must be strings and must not start with a $.
  • Values must be one of the following data types:
    • String
    • Integer (converted to a 64-bit floating point by Pinecone)
    • Floating point
    • Boolean (true, false)
    • List of strings
  • Null metadata values aren’t supported. Instead of setting a key to null, remove the key from the metadata payload.

Examples

{
  "document_id": "document1",
  "document_title": "Introduction to Vector Databases",
  "chunk_number": 1,
  "chunk_text": "First chunk of the document content...",
  "is_public": true,
  "tags": ["beginner", "database", "vector-db"],
  "scores": ["85", "92"]
}

Metadata query language

Pinecone’s filtering query language is based on MongoDB’s query and projection operators. Pinecone currently supports a subset of those selectors:

FilterDescriptionSupported types
$eqMatches with metadata values that are equal to a specified value. Example: {"genre": {"$eq": "documentary"}}Number, string, boolean
$neMatches with metadata values that are not equal to a specified value. Example: {"genre": {"$ne": "drama"}}Number, string, boolean
$gtMatches with metadata values that are greater than a specified value. Example: {"year": {"$gt": 2019}}Number
$gteMatches with metadata values that are greater than or equal to a specified value. Example:{"year": {"$gte": 2020}}Number
$ltMatches with metadata values that are less than a specified value. Example: {"year": {"$lt": 2020}}Number
$lteMatches with metadata values that are less than or equal to a specified value. Example: {"year": {"$lte": 2020}}Number
$inMatches with metadata values that are in a specified array. Example: {"genre": {"$in": ["comedy", "documentary"]}}String, number
$ninMatches with metadata values that are not in a specified array. Example: {"genre": {"$nin": ["comedy", "documentary"]}}String, number
$existsMatches with the specified metadata field. Example: {"genre": {"$exists": true}}Number, string, boolean
$andJoins query clauses with a logical AND. Example: {"$and": [{"genre": {"$eq": "drama"}}, {"year": {"$gte": 2020}}]}-
$orJoins query clauses with a logical OR. Example: {"$or": [{"genre": {"$eq": "drama"}}, {"year": {"$gte": 2020}}]}-

Only $and and $or are allowed at the top level of the query expression.

For example, the following has a "genre" metadata field with a list of strings:

JSON
{ "genre": ["comedy", "documentary"] }

This means "genre" takes on both values, and requests with the following filters will match:

JSON
{"genre":"comedy"}

{"genre": {"$in":["documentary","action"]}}

{"$and": [{"genre": "comedy"}, {"genre":"documentary"}]}

However, requests with the following filter will not match:

JSON
{ "$and": [{ "genre": "comedy" }, { "genre": "drama" }] }

Additionally, requests with the following filters will not match because they are invalid. They will result in a compilation error:

JSON
# INVALID QUERY:
{"genre": ["comedy", "documentary"]}
JSON
# INVALID QUERY:
{"genre": {"$eq": ["comedy", "documentary"]}}

Limitations