Winner
Objective: The test is intended to check which of the two responses are better according to the concept_set provided. It can be used to test two model outputs or model output in comaparison to human output.
Required Parameters:
Response: The sentence generated by the model using the words in concept_set.
Expected_Response: The sentence generated by the model you want to compare or the ground truth.
Concept_set: A list of words in their root form, each tagged with its corresponding part of speech (e.g., "_V" for verb, "_N" for noun, "_A" for adjective).
Optional Parameters:
model (str, optional): The name of the language model to be used for task of evaluating responses. Defaults to "gpt-3.5-turbo".
temperature (float,optional): This parameter allows you to adjust the randomness of the response generated by the specified model.
max_tokens (int,optional): This parameter allows you to specify the maximum length of the generated response.
Interpretation: 1 indicates model response is better than ground truth/other model response.
Note: Always use words in their root form in the concept set to focus on parts of speech application rather than word conjugation or inflection.
Last updated