Summarization

Objective: The test determines if LLM generates factually correct summaries with necessary details.

Required Parameters: Prompt, Response

Interpretation: A higher score signifies better summary quality.

# Here response(summary) is factually correct and contains details, so the test is passed.
evaluator.add_test(
    test_names=["summarisation_test"],
    data={
        "prompt": ["Summarize: In the realm of sports, athleticism intertwines with passion and competition. From the roar of stadiums to the grit of training grounds, athletes inspire with their feats, uniting diverse communities worldwide."],
        "response": ["Sports epitomize human passion and unity, showcasing athleticism's prowess and communal bonds across global arenas."]
    },
    arguments={"model": "gpt-4", "threshold": 0.5},
).run()

evaluator.print_results()

# Here response(summary) is not factually correct, so the test is failed.
evaluator.add_test(
    test_names=["summarisation_test"],
    data={
        "prompt": ["Summarize: World War I was triggered by a complex web of political, economic, and social factors. The assassination of Archduke Franz Ferdinand, militarism, alliances, and imperialistic ambitions were political causes. Economically, competition and colonial rivalries played a role. Social tensions also contributed. Consequences included the Treaty of Versailles, redrawing of borders, and the League of Nations."],
        "response": ["World War I was triggered because of environmental factors."]
    },
    arguments={"model": "gpt-4", "threshold": 0.5},
).run()

evaluator.print_results()

Last updated 1 year ago

Was this helpful?