Hallucination

The Hallucination Metric helps users evaluate whether an agent has provided fabricated or inaccurate responses. This metric is particularly useful for validating trace-level evaluations. Below are the steps to configure and execute the hallucination metric effectively.


About Hallucination Metric

  • Type: Trace-level Metric

  • Evaluation Level: Can be run as a trace evaluation.

  • Purpose: Identifies instances where the agent generates responses that deviate from the expected or factual outputs.


How to Configure the Hallucination Metric

Step 1: Prerequisites

To configure the hallucination metric, you need the following:

  1. Vertex AI Service Account:

  2. Service Account Role:

    • Assign the role "Vertex AI Administrator" to the service account.

  3. Service Account Key:


Step 2: Adding Configuration to RagaAI Catalyst

  1. Access Settings:

    • Navigate to Settings > API Key > Create New Parameter.

  2. Add Keys:

    • Key: GOOGLE_APPLICATION_CREDENTIALS

      • Value: Paste the content of the .json file.

    • Key: vertex_location

      • Value: The location of the Vertex AI project (e.

    • Key: vertex_project

      • Value: The Project ID where Vertex AI is enabled.


How to Run the Hallucination Metric

Steps to Execute

  1. Access the Dataset:

    • Navigate to the dataset and click on Evaluate.

  2. Select the Metric:

    • Choose Hallucination-Alteryx from the available metric options.

    • You can rename the metric if needed.

  3. Set Evaluation Type:

    • Select the evaluation type:

      • Trace Evaluation or

      • Conversation Evaluation.

  4. Define the Schema:

    • Specify the schema as _trace .

  5. Model Configuration:

    • Choose the model configuration for the evaluation.

  6. Passing Criteria:

    • Define pass/fail thresholds to set success criteria.

  7. Run Evaluation:

    • Click on Run to initiate the hallucination metric evaluation.


When to Run the Hallucination Metric?

Run the hallucination metric:

  • Trace Evaluation: When you need to evaluate individual traces for inaccurate or fabricated responses.

This metric is most effective when you want to identify and analyse instances where the agent's output deviates from expected or factual information.

Last updated