Objective: The test determines if LLM generates factually correct summaries with necessary details.
Required Parameters: Prompt, Response
Interpretation: A higher score signifies better summary quality.
# Here response(summary) is factually correct and contains details, so the test is passed.evaluator.add_test( test_names=["summarisation_test"], data={ "prompt": ["Summarize: In the realm of sports, athleticism intertwines with passion and competition. From the roar of stadiums to the grit of training grounds, athletes inspire with their feats, uniting diverse communities worldwide."],
"response": ["Sports epitomize human passion and unity, showcasing athleticism's prowess and communal bonds across global arenas."]
}, arguments={"model": "gpt-4", "threshold": 0.5},).run()evaluator.print_results()
# Here response(summary) is not factually correct, so the test is failed.evaluator.add_test( test_names=["summarisation_test"], data={ "prompt": ["Summarize: World War I was triggered by a complex web of political, economic, and social factors. The assassination of Archduke Franz Ferdinand, militarism, alliances, and imperialistic ambitions were political causes. Economically, competition and colonial rivalries played a role. Social tensions also contributed. Consequences included the Treaty of Versailles, redrawing of borders, and the League of Nations."],
"response": ["World War I was triggered because of environmental factors."] }, arguments={"model": "gpt-4", "threshold": 0.5},).run()evaluator.print_results()