Hallucination
The Hallucination Metric helps users evaluate whether an agent has provided fabricated or inaccurate responses. This metric is particularly useful for validating trace-level evaluations. Below are the steps to configure and execute the hallucination metric effectively.
About Hallucination Metric
Type: Trace-level Metric
Evaluation Level: Can be run as a trace evaluation.
Purpose: Identifies instances where the agent generates responses that deviate from the expected or factual outputs.
How to Configure the Hallucination Metric
Step 1: Prerequisites
To configure the hallucination metric, you need the following:
Vertex AI Service Account:
Ensure you have a Vertex AI-enabled project in Google Cloud.
Refer to Create Service Accounts for setup guidance.
Service Account Role:
Assign the role "Vertex AI Administrator" to the service account.
Service Account Key:
Create a key for the service account. Refer to Create and Manage Service Account Keys.
You will receive a
.json
file as the key.
Step 2: Adding Configuration to RagaAI Catalyst
Access Settings:
Navigate to
Settings > API Key > Create New Parameter
.
Add Keys:
Key:
GOOGLE_APPLICATION_CREDENTIALS
Value: Paste the content of the
.json
file.
Key:
vertex_location
Value: The location of the Vertex AI project (e.
Key:
vertex_project
Value: The Project ID where Vertex AI is enabled.
How to Run the Hallucination Metric
Steps to Execute
Access the Dataset:
Navigate to the dataset and click on Evaluate.
Select the Metric:
Choose Hallucination-Alteryx from the available metric options.
You can rename the metric if needed.
Set Evaluation Type:
Select the evaluation type:
Trace Evaluation or
Conversation Evaluation.
Define the Schema:
Specify the schema as
_trace
.
Model Configuration:
Choose the model configuration for the evaluation.
Passing Criteria:
Define pass/fail thresholds to set success criteria.
Run Evaluation:
Click on Run to initiate the hallucination metric evaluation.
When to Run the Hallucination Metric?
Run the hallucination metric:
Trace Evaluation: When you need to evaluate individual traces for inaccurate or fabricated responses.
This metric is most effective when you want to identify and analyse instances where the agent's output deviates from expected or factual information.
Last updated