Hallucination
Objective: This metric measures the extent of the model hallucinating i.e. model is making up a response based on its imagination which is far from being true to the correct response.
Required Parameters: Response, Context
Interpretation: A higher score indicates the model response was hallucinated.
Code Example:
# Hallucination Test
prompt = "What was the blond doing?"
response = "A blond drinking water in public."
contradiction = "A blond woman is drinking water in public."
context = [
"A man with blond-hair, and a brown shirt drinking out of a public water fountain."
]
evaluator.add_test(
test_names="hallucination_test",
data={
"prompt": prompt,
"response": response,
"context": context,
},
arguments={"model": "gpt-4", "threshold": 0.6},
).add_test(
test_names="hallucination_test",
data={
"prompt": prompt,
"response": contradiction,
"context": context,
},
arguments={"model": "gpt-4", "threshold": 0.6},
).run()
evaluator.print_results()
Last updated
Was this helpful?