Response Correctness

Objective: This metric measures how accurate and factually grounded the entire response is, as compared to the expected response (ground truth).

Parameters:

  • Response ,Expected Response

Interpretation: Higher score indicates the model response was correct for the prompt. Failed result indicates the response is not factually correct compared to the expected response.

Code Execution:

experiment_manager = Experiment(project_name="project_name",
                                experiment_name="experiment_name",
                                dataset_name="dataset_name")

response = experiment_manager.add_metrics(
    metrics=[
        {"name":"Response Correctness", "config": {"reason": True, "model": "gpt-4o-mini", "batch_size" : 5, "provider": "OpenAI"}}
    ]
)

Refer Executing tests page to learn about Metric Configurations

Example:

  • Prompt: Who was the first person to walk on the moon and when did it happen?

  • Expected Response (Ground Truth): The first person to walk on the moon was Neil Armstrong, and it happened on July 20, 1969.

  • Response: The first person to walk on the moon was Buzz Aldrin, and it happened on July 20, 1970.

  • Metric Output: {‘score’: 0, ‘reason’: ‘Neil Armstrong is the first person to walk on moon on July 20, 1969.’}

Last updated