How it all fits together
The SDK’s observability support is designed to be flexible. You don’t need to adopt the entire observability stack at once — start simple and add layers as your needs grow. Here are the components involved and how they relate to each other:- Pinecone Java SDK: Exposes a
ResponseMetadataListenercallback, a plain Java interface with no external dependencies. At its simplest, you can log the metadata to the console. No additional tools required. - OpenTelemetry (OTel): An open standard and SDK for producing structured telemetry data (metrics, traces, logs). If you want standardized metrics that follow semantic conventions, you add the OTel SDK and wire it to the listener. This is optional.
- OTel Collector: A vendor-neutral service that receives telemetry from your app and forwards it to a storage backend. Optional — many setups export directly from the app to a backend.
- Prometheus: A time-series database that stores metrics, making them queryable over time. One popular storage option.
- Grafana: A visualization and dashboarding tool that queries Prometheus (or other backends) and displays charts and alerts. One popular visualization option.
This is just one example pipeline. You can substitute Datadog, New Relic, or any OTel-compatible backend. You can also skip OTel entirely and use Micrometer, custom logging, or any approach that suits your stack.
Response metadata listener
The Java SDK captures response metadata through aResponseMetadataListener — a functional interface you provide when building the Pinecone client. The listener is called after each data plane operation completes (whether it succeeds or fails), and receives a ResponseMetadata object containing timing, status, and context information.
The SDK itself has no OpenTelemetry dependency. You bring your own observability library and decide what to do with the metadata.
Supported operations
The following data plane operations are instrumented, for both synchronous (Index) and asynchronous (AsyncIndex) usage:
| Operation | Description |
|---|---|
upsert | Insert or update vectors |
query | Search for similar vectors |
fetch | Retrieve vectors by ID |
update | Update vector metadata |
delete | Delete vectors |
Available metadata
EachResponseMetadata object provides the following fields:
| Method | Description | OTel attribute |
|---|---|---|
getOperationName() | Operation type (e.g., upsert, query) | db.operation.name |
getIndexName() | Pinecone index name | pinecone.index_name |
getNamespace() | Namespace (empty string if default) | db.namespace |
getServerAddress() | Pinecone server host | server.address |
getClientDurationMs() | Total round-trip time in ms (always available) | — |
getServerDurationMs() | Server processing time in ms (may be null) | — |
getNetworkOverheadMs() | Client minus server duration in ms (may be null) | — |
getStatus() | "success" or "error" | status |
getGrpcStatusCode() | Raw gRPC status code (e.g., OK, UNAVAILABLE) | db.response.status_code |
getErrorType() | Error category, or null if successful | error.type |
errorType values: validation, connection, server, rate_limit, timeout, auth, not_found, unknown.
Recommended metrics
If you’re recording OTel metrics, the SDK example project uses these metric names, which follow OTel semantic conventions for database clients:| Metric | Type | Unit | Description |
|---|---|---|---|
db.client.operation.duration | Histogram | ms | Client-measured round-trip time |
pinecone.server.processing.duration | Histogram | ms | Server processing time |
db.client.operation.count | Counter | — | Total number of operations |
Quick start: Simple logging
The simplest way to use the listener is to log the metadata directly. This requires no additional dependencies beyond the Pinecone SDK:Quick start: OpenTelemetry integration
To record structured metrics with OpenTelemetry, add the OTel SDK dependencies and wire a metrics recorder to the listener.1. Add dependencies
Add the following to yourpom.xml:
2. Create a metrics recorder
The SDK’s example project includes a reusablePineconeMetricsRecorder class you can copy into your project. It implements ResponseMetadataListener and records all three recommended metrics with proper OTel attributes:
3. Wire it into the Pinecone client
Initialize the OTel SDK, create the recorder, and pass it to the Pinecone client builder:Example: Micrometer/Prometheus
If your application uses Micrometer (common in Spring Boot), you can wire the listener to Micrometer instead of the OTel SDK:Visualizing metrics
Once your metrics are flowing to a backend, you can build dashboards to monitor your Pinecone operations. If you’re using Prometheus and Grafana, here are some useful queries: P50 and P95 client latency:Understanding the latency breakdown
TheResponseMetadata object provides three timing values that help you pinpoint the source of latency issues:
| Component | Method | What it measures |
|---|---|---|
| Client duration | getClientDurationMs() | Total round-trip time from request start to response completion. Always available. |
| Server duration | getServerDurationMs() | Time the Pinecone backend spent processing the request. Extracted from the x-pinecone-response-duration-ms response header. May be null. |
| Network overhead | getNetworkOverheadMs() | The difference: client duration minus server duration. Includes network latency, serialization, and deserialization. May be null. |
- High server duration: The bottleneck is on the Pinecone backend. Consider optimizing your query (e.g., reducing
topK, using metadata filters), or check the Pinecone status page. - High network overhead: The bottleneck is in the network path between your application and Pinecone. Consider deploying your application closer to your index’s cloud region, or check for network issues.
Limitations
- Data plane operations only. Control plane operations (e.g., creating or deleting indexes) are not currently instrumented.
- Bulk import operations are not yet instrumented.
- Server duration may be unavailable. The
getServerDurationMs()method returnsnullif thex-pinecone-response-duration-msheader is not present in the response. - Synchronous callback. The listener is called synchronously after the gRPC response is received. Keep implementations lightweight and non-blocking to avoid adding latency to your operations. For heavy processing, queue the metadata for async handling.
- Exceptions are swallowed. Exceptions thrown by the listener are logged but do not affect the operation result.
Best practices
- Keep listeners lightweight. Record metrics or enqueue work — don’t do I/O or heavy computation in the callback.
- Follow OTel semantic conventions. Use the attribute names shown in the recommended metrics table for interoperability with standard dashboards and tooling.
- Monitor both client and server duration. Tracking both lets you separate Pinecone backend performance from network conditions.
- Set alerts on error rates. Use the
statusanderror.typeattributes to build alerts for elevated error rates across operations.