Correctness

Objective: This metric checks the correctness of your LLM response compared Is the submission is factually accurate and free from errors to the expected response

Parameters:

  • prompt ,response ,expected_response

Interpretation: Higher score indicates the model response was correct for the prompt. Failed result indicates the response is not factually correct compared to the expected response.

Code Execution:

experiment_manager = Experiment(project_name="project_name",
                                experiment_name="experiment_name",
                                dataset_name="dataset_name")

response = experiment_manager.add_metrics(
    metrics=[
        {"name":"correctness_gt", "config": {"model": "gpt-4o"}},
        {"name":"correctness_gt", "config": {"model": "gpt-4"}},
        {"name":"correctness_gt", "config": {"model": "gpt-3.5-turbo"}}
    ]
)

Last updated