SQL Response Correctness

Objective: This metric assesses the correctness of the SQL response generated by the model. It compares the Response (SQL query generated by the model) with the Expected Response (correct SQL query). An LLM is used as a judge to determine if the generated SQL response is logically correct and returns the same results as the expected response.

Required Columns in Dataset:

  • Prompt: Query from the user

  • Response: The SQL query generated by the model.

  • Expected Response: The correct SQL query that should have been generated.

Interpretation: A higher score indicates that the model-generated SQL query is more accurate and aligns closely with the expected SQL query.

Code Execution:

experiment_manager = Experiment(project_name="project_name",
                                experiment_name="experiment_name",
                                dataset_name="dataset_name")

# Hallucination Test
response =  experiment_manager.add_metrics(
    metrics=[
{"name":"SQL Response Correctness", "config": {"reason": True, "model": "gpt-4o-mini", "batch_size" : 5, "provider": "OpenAI"}}
    ]
)

Refer Executing tests page to learn about Metric Configurations

Example:

  • Response: SELECT name, age FROM students WHERE grade = 'A';

  • Expected Response: SELECT student_name, student_age FROM students WHERE grade = 'A' AND is_enrolled = true;

  • Metric Score: 0.2

  • Reasoning: The model-generated response uses name and age instead of the correct column names student_name and student_age. The generated SQL query does not include the AND is_enrolled = true condition, which filters the results to only include currently enrolled students. The model correctly identified the table (students) and partially matched the WHERE clause with the grade = 'A' condition, but it failed to fully replicate the expected logic.

Last updated