Cosine Similarity

Objective: Measures the cosine similarity between the textual content of a prompt and its corresponding response. Cosine similarity quantifies the cosine of the angle between two multi-dimensional vectors representing the text, providing a metric for assessing the relevance or similarity of the response to the prompt.

Required Parameters:

  • Prompt (str): The initial question or statement provided to the model.

  • Response (str): The model's generated answer or reaction to the prompt.

Interpretation:

  • A higher score indicates a greater similarity between the prompt and response, suggesting that the response is relevant and closely aligned with the prompt's subject matter.

  • The score ranges from 0 (no similarity) to 1 (identical), with higher values indicating closer alignment between the prompt and response content.

Result Interpretation:

  • The test result is determined by comparing the cosine similarity score against a predefined threshold. Scores above this threshold indicate that the prompt and response are sufficiently similar ("passed"), while scores below it indicate a lack of similarity ("failed").

  • The choice of threshold can vary based on the desired level of strictness in similarity; a common starting point is 0.6, but this may be adjusted based on specific requirements.

# Test indicating low similarity (expected to fail)
evaluator.add_test(
    test_names=["cosine_similarity_test"],
    data={
        "prompt": "What is the capital of France?",
        "response": "Artificial intelligence (AI) is the simulation of human intelligence processes by machines, especially computer systems."
    },
    arguments={"threshold": 0.6},
).run()

evaluator.print_results()

Last updated