Understanding Evaluation API
The Evaluation API provides a way to evaluate the correctness and completeness of a response from a RAG system.
This feature is in public preview.
Use cases
The Evaluation API is useful when performing tasks like the following:
- Understanding how well the Pinecone Assistant captures the facts of the ground truth answer.
- Comparing the Pinecone Assistant’s answers to those of another RAG system.
- Comparing the answers of your own RAG system to those of the Pinecone Assistant or another RAG system.
Install the Pinecone Assistant Python plugin
To interact with Pinecone Assistant using the Python SDK, upgrade the client and install the pinecone-plugin-assistant
package as follows:
Request
The request body requires the following fields:
Field | Description |
---|---|
question | The question asked to the RAG system. |
answer | The answer provided by the assistant being evaluated. |
ground_truth_answer | The expected answer. |
For example:
Response
Metrics
Calculated scores between 0
to 1
are returned for the following metrics:
Metric | Description |
---|---|
correctness | Correctness of the RAG system’s answer compared to the ground truth answer. |
completeness | Completeness of the RAG system’s answer compared to the ground truth answer. |
alignment | A combined score of the correctness and completeness scores. |
Reasoning
The response includes explanations for the reasoning behind each metric’s score. This includes a list of evaluated facts with their entailment status:
Status | Description |
---|---|
entailed | The fact is supported by the ground truth answer. |
contradicted | The fact contradicts the ground truth answer. |
neutral | The fact is neither supported nor contradicted by the ground truth answer. |
Usage
The response includes the number of tokens used to calculate the metrics. This includes the number of tokens used for the prompt and completion.
Pricing
Cost is calculated by token usage. See Pricing for up-to-date pricing information.
The Evaluation API is only available for Standard and Enterprise plans.
Was this page helpful?