Summarization
Objective: The test determines if LLM generates factually correct summaries with necessary details.
Required Parameters: Prompt, Response
Interpretation: A higher score signifies better summary quality.
# Here response(summary) is factually correct and contains details, so the test is passed.
evaluator.add_test(
test_names=["summarisation_test"],
data={
"prompt": ["Summarize: In the realm of sports, athleticism intertwines with passion and competition. From the roar of stadiums to the grit of training grounds, athletes inspire with their feats, uniting diverse communities worldwide."],
"response": ["Sports epitomize human passion and unity, showcasing athleticism's prowess and communal bonds across global arenas."]
},
arguments={"model": "gpt-4", "threshold": 0.5},
).run()
evaluator.print_results()
# Here response(summary) is not factually correct, so the test is failed.
evaluator.add_test(
test_names=["summarisation_test"],
data={
"prompt": ["Summarize: World War I was triggered by a complex web of political, economic, and social factors. The assassination of Archduke Franz Ferdinand, militarism, alliances, and imperialistic ambitions were political causes. Economically, competition and colonial rivalries played a role. Social tensions also contributed. Consequences included the Treaty of Versailles, redrawing of borders, and the League of Nations."],
"response": ["World War I was triggered because of environmental factors."]
},
arguments={"model": "gpt-4", "threshold": 0.5},
).run()
evaluator.print_results()
Last updated