Hallucination

Objective: This metric measures the extent of the model hallucinating i.e. model is making up a response based on its imagination which is far from being true to the correct response.

Required Parameters: Response, Context

Interpretation: A higher score indicates the model response was hallucinated.

Code Example:

# Hallucination Test
prompt = "What was the blond doing?"
response = "A blond drinking water in public."
contradiction = "A blond woman is drinking water in public."
context = [
    "A man with blond-hair, and a brown shirt drinking out of a public water fountain."
]
evaluator.add_test(
    test_names="hallucination_test",
    data={
        "prompt": prompt,
        "response": response,
        "context": context,
    },
    arguments={"model": "gpt-4", "threshold": 0.6},
).add_test(
    test_names="hallucination_test",
    data={
        "prompt": prompt,
        "response": contradiction,
        "context": context,
    },
    arguments={"model": "gpt-4", "threshold": 0.6},
).run()

evaluator.print_results()

Last updated 10 months ago

Was this helpful?