Monitor usage and costs

Monitor organization-level usage and costs

To view usage and costs across your Pinecone organization, you must be an organization owner. Also, this feature is available only to organizations on the Standard or Enterprise plans.

The Usage dashboard in the Pinecone console gives you a detailed report of usage and costs across your organization, broken down by each billable SKU or aggregated by project or service. You can view the report in the console or download it as a CSV file for more detailed analysis.

Go to Settings > Usage in the Pinecone console.
Select the time range to report on. This defaults to the last 30 days.
Select the scope for your report:
- SKU: The usage and cost for each billable SKU, for example, read units per cloud region, storage size per cloud region, or tokens per embedding model.
- Project: The aggregated cost for each project in your organization.
- Service: The aggregated cost for each service your organization uses, for example, database (includes serverless back up and restore), assistants, inference (embedding and reranking), and collections.
Choose the specific SKUs, projects, or services you want to report on. This defaults to all.
To download the report as a CSV file, click Download.
The CSV download provides more granular detail than the console view, including breakdowns by individual index as well as project and index tags.

Dates are shown in UTC to match billing invoices. Cost data is delayed up to three days from the actual usage date.

Monitor index-level usage

You can monitor index-level usage directly in the Pinecone console, or you can pull them into Prometheus. For more details, see Monitoring.

Monitor operation-level usage

Read units

Query, fetch, and list by ID requests return a usage parameter with the read unit consumption of each request that is made.

While Pinecone tracks read unit usage with decimal precision, the Pinecone API and SDKs round these values up to the nearest whole number in query, fetch, and list responses. For example, if a query uses 0.45 read units, the API and SDKs will report it as 1 read unit.For precise read unit reporting, see index-level metrics or the organization-wide Usage dashboard.

Indexes built on Dedicated Read Nodes are not subject to read unit limits for query, fetch, and list operations. For sizing and capacity planning guidance, see the Dedicated Read Nodes guide.

Example query request:

from pinecone.grpc import PineconeGRPC as Pinecone

pc = Pinecone(api_key="YOUR_API_KEY")
index = pc.Index("example-index")

response = index.query(
    vector=[0.22,0.43,0.16,1,...], 
    namespace='example-namespace', 
    top_k=3,
    include_values=False,
    include_metadata=False
)

print(response)

The response looks like this:

{'matches': [{'id': 'record_193027', 'score': 0.00405937387, 'values': []},
             {'id': 'record_137452', 'score': 0.00405937387, 'values': []},
             {'id': 'record_132264', 'score': 0.00405937387, 'values': []}],
 'namespace': 'example-namespace',
 'usage': {'read_units': 1}}

For a more in-depth demonstration of how to use read units to inspect read costs, see this notebook.

Embedding tokens

Requests to one of Pinecone’s hosted embedding models, either directly via the embed operation or automatically when upserting or querying an index with integrated embedding, return a usage parameter with the total tokens generated. For example, the following request to use the multilingual-e5-large model to generate embeddings for sentences related to the word “apple” might return this request and summary of embedding tokens generated:

# Import the Pinecone library
from pinecone.grpc import PineconeGRPC as Pinecone
from pinecone import ServerlessSpec
import time

# Initialize a Pinecone client with your API key
pc = Pinecone(api_key="YOUR_API_KEY")

# Define a sample dataset where each item has a unique ID and piece of text
data = [
    {"id": "vec1", "text": "Apple is a popular fruit known for its sweetness and crisp texture."},
    {"id": "vec2", "text": "The tech company Apple is known for its innovative products like the iPhone."},
    {"id": "vec3", "text": "Many people enjoy eating apples as a healthy snack."},
    {"id": "vec4", "text": "Apple Inc. has revolutionized the tech industry with its sleek designs and user-friendly interfaces."},
    {"id": "vec5", "text": "An apple a day keeps the doctor away, as the saying goes."},
    {"id": "vec6", "text": "Apple Computer Company was founded on April 1, 1976, by Steve Jobs, Steve Wozniak, and Ronald Wayne as a partnership."}
]

# Convert the text into numerical vectors that Pinecone can index
embeddings = pc.inference.embed(
    model="llama-text-embed-v2",
    inputs=[d['text'] for d in data],
    parameters={"input_type": "passage", "truncate": "END"}
)

print(embeddings)

The returned object looks like this:

EmbeddingsList(
    model='llama-text-embed-v2',
    data=[
        {'values': [0.04925537109375, -0.01313018798828125, -0.0112762451171875, ...]},
        ...
    ],
    usage={'total_tokens': 130}
)

Get started

Index data

Search

Optimize

Manage data

Manage cost

Move to production

Admin

Operations

Using pods

Monitor organization-level usage and costs

Monitor index-level usage

Monitor operation-level usage

Read units

Embedding tokens

See also

Get started

Index data

Search

Optimize

Manage data

Manage cost

Move to production

Admin

Operations

Using pods

​Monitor organization-level usage and costs

​Monitor index-level usage

​Monitor operation-level usage

​Read units

​Embedding tokens

​See also

Monitor organization-level usage and costs

Monitor index-level usage

Monitor operation-level usage

Read units

Embedding tokens

See also