This page shows you how to monitor the overall usage and costs for your Pinecone organization as well as usage and performance metrics for individual indexes.

Monitor organization-level usage

You must be the organization owner to view usage across your Pinecone organization. Also, this feature is available only to organizations on the Standard or Enterprise plans.

To view and download a report of your usage and costs for your Pinecone organization, go to Settings > Usage in the Pinecone console.

All dates are given in UTC to match billing invoices.

Monitor index-level usage

You can monitor index-level usage directly in the Pinecone console, or you can pull them into Prometheus. For more details, see Monitoring.

Monitor operation-level usage

Read units

Read operations like query and fetch return a usage parameter with the read unit consumption of each request that is made. For example, a query to an example index might return this result and summary of read unit usage:

from pinecone.grpc import PineconeGRPC as Pinecone

pc = Pinecone(api_key="YOUR_API_KEY")
index = pc.Index("pinecone-index")

index.query(
  vector=[0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3],
  top_k=3,
  include_values=True
)
# Returns:
# {
#     "matches": [
#         {
#             "id": "C",
#             "score": -1.76717265e-07,
#             "values": [0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3, 0.3],
#         },
#         {
#             "id": "B",
#             "score": 0.080000028,
#             "values": [0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2, 0.2],
#         },
#         {
#             "id": "D",
#             "score": 0.0800001323,
#             "values": [0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4, 0.4],
#         },
#     ],
#     "namespace": "",
#     "usage": {"read_units": 5}
# }

For a more in-depth demonstration of how to use read units to inspect read costs, see this notebook.

Embedding tokens

Requests to one of Pinecone’s hosted embedding models, either directly via the embed operation or automatically when upserting or querying an index with integrated embedding, return a usage parameter with the total tokens generated.

For example, the following request to use the multilingual-e5-large model to generate embeddings for sentences related to the word “apple” might return this request and summary of embedding tokens generated:

# Import the Pinecone library
from pinecone.grpc import PineconeGRPC as Pinecone
from pinecone import ServerlessSpec
import time

# Initialize a Pinecone client with your API key
pc = Pinecone(api_key="YOUR_API_KEY")

# Define a sample dataset where each item has a unique ID and piece of text
data = [
    {"id": "vec1", "text": "Apple is a popular fruit known for its sweetness and crisp texture."},
    {"id": "vec2", "text": "The tech company Apple is known for its innovative products like the iPhone."},
    {"id": "vec3", "text": "Many people enjoy eating apples as a healthy snack."},
    {"id": "vec4", "text": "Apple Inc. has revolutionized the tech industry with its sleek designs and user-friendly interfaces."},
    {"id": "vec5", "text": "An apple a day keeps the doctor away, as the saying goes."},
    {"id": "vec6", "text": "Apple Computer Company was founded on April 1, 1976, by Steve Jobs, Steve Wozniak, and Ronald Wayne as a partnership."}
]

# Convert the text into numerical vectors that Pinecone can index
embeddings = pc.inference.embed(
    model="multilingual-e5-large",
    inputs=[d['text'] for d in data],
    parameters={"input_type": "passage", "truncate": "END"}
)

print(embeddings)

The returned object looks like this:

EmbeddingsList(
    model='multilingual-e5-large',
    data=[
        {'values': [0.04925537109375, -0.01313018798828125, -0.0112762451171875, ...]},
        ...
    ],
    usage={'total_tokens': 130}
)

See also