SQL Response Correctness
Last updated
Last updated
Objective: This metric assesses the correctness of the SQL response generated by the model. It compares the Response (SQL query generated by the model) with the Expected Response (correct SQL query). An LLM is used as a judge to determine if the generated SQL response is logically correct and returns the same results as the expected response.
Required Columns in Dataset:
Prompt: Query from the user
Response: The SQL query generated by the model.
Expected Response: The correct SQL query that should have been generated.
Interpretation: A higher score indicates that the model-generated SQL query is more accurate and aligns closely with the expected SQL query.
Metric Execution via UI:
Code Execution:
Example:
Response: SELECT name, age FROM students WHERE grade = 'A';
Expected Response: SELECT student_name, student_age FROM students WHERE grade = 'A' AND is_enrolled = true;
Metric Score: 0.2
Reasoning: The model-generated response uses name
and age
instead of the correct column names student_name
and student_age.
The generated SQL query does not include the AND is_enrolled = true
condition, which filters the results to only include currently enrolled students. The model correctly identified the table (students
) and partially matched the WHERE
clause with the grade = 'A'
condition, but it failed to fully replicate the expected logic.