Winner

Objective: The test is intended to check which of the two responses are better according to the concept_set provided. It can be used to test two model outputs or model output in comaparison to human output.

Required Parameters:

  • Response: The sentence generated by the model using the words in concept_set.

  • Expected_Response: The sentence generated by the model you want to compare or the ground truth.

  • Concept_set: A list of words in their root form, each tagged with its corresponding part of speech (e.g., "_V" for verb, "_N" for noun, "_A" for adjective).

Optional Parameters:

  • model (str, optional): The name of the language model to be used for task of evaluating responses. Defaults to "gpt-3.5-turbo".

  • temperature (float,optional): This parameter allows you to adjust the randomness of the response generated by the specified model.

  • max_tokens (int,optional): This parameter allows you to specify the maximum length of the generated response.

Interpretation: 1 indicates model response is better than ground truth/other model response.

Note: Always use words in their root form in the concept set to focus on parts of speech application rather than word conjugation or inflection.

# Example where model response is better than ground truth.
# Make sure you pass all words in concept_set in their root form.

evaluator.add_test(
    test_names=["winner_test"],
    data={
        "response" : "The family sits at the table with delicious food placed in front of them.",
        "expected_response" : "I sit at the front of the table and enjoy my food.",
        "concept_set" : ["food_N", "front_N", "sit_V", "table_N"]
    },
    arguments={"model": "gpt-4"},
).run()

evaluator.print_results()

# Example where ground truth is better than model response.
# Make sure you pass all words in concept_set in their root form.

evaluator.add_test(
    test_names=["winner_test"],
    data={
        "response" : "A brown fox quickly jumps over a sleeping dog.",
        "expected_response" : "The quick brown fox jumps over the lazy dog.",
        "concept_set" : ["quick_V", "brown_A", "fox_N", "jump_V", "lazy_A", "dog_N"]
    },
    arguments={"model": "gpt-4"},
).run()

evaluator.print_results()

Last updated