Correctness
Objective: This metric checks the correctness of your LLM response compared Is the submission is factually accurate and free from errors to the expected response
Parameters:
data:
prompt
(str): The prompt for the response.response
(str): The actual response to be evaluated.expected_response
(str): The expected response for comparison.context
(str): The ground truth for comparison.
arguments:
model
(str, optional): The model to be used for evaluation (default is "gpt-3.5-turbo").threshold
(float, optional): The threshold for correctness score (default is 0.5).
Interpretation: Higher score indicates the model response was correct for the prompt. Failed result indicates the response is not factually correct compared to the expected response.
Last updated