Overall
"response" : "The quick brown fox jumps over the lazy dog.",
"expected_response" : "The quick brown fox beats the dog.",
"concept_set" : ["quick_A", "brown_A", "fox_N", "jump_V", "lazy_A", "dog_N"]
Here, winner_test will return score 1 as response is better than expected_response. Pos_test and Cover_test will return 1 as all concepts are covered and all PoS used in response are correct. Hence final output, 1*1*1 = 1# Example with higher score.
# Make sure you pass all words in concept_set in their root form.
evaluator.add_test(
test_names=["overall_test"],
data={
"response" : "The quick brown fox jumps over the lazy dog.",
"expected_response" : "The quick brown fox beats the dog.",
"concept_set" : ["quick_A", "brown_A", "fox_N", "jump_V", "lazy_A", "dog_N"]
},
arguments={"model": "gpt-4"},
).run()
evaluator.print_results()Last updated
Was this helpful?

