SQL Response Correctness
Objective: This metric assesses the correctness of the SQL response generated by the model. It compares the Response (SQL query generated by the model) with the Expected Response (correct SQL query). An LLM is used as a judge to determine if the generated SQL response is logically correct and returns the same results as the expected response.
Required Columns in Dataset:
Prompt: Query from the user
Response: The SQL query generated by the model.
Expected Response: The correct SQL query that should have been generated.
Interpretation: A higher score indicates that the model-generated SQL query is more accurate and aligns closely with the expected SQL query.
Metric Execution via UI:
Code Execution:
Example:
Response:
SELECT name, age FROM students WHERE grade = 'A';
Expected Response:
SELECT student_name, student_age FROM students WHERE grade = 'A' AND is_enrolled = true;
Metric Score:
0.2
Reasoning: The model-generated response uses
name
andage
instead of the correct column namesstudent_name
andstudent_age.
The generated SQL query does not include theAND is_enrolled = true
condition, which filters the results to only include currently enrolled students. The model correctly identified the table (students
) and partially matched theWHERE
clause with thegrade = 'A'
condition, but it failed to fully replicate the expected logic.
Last updated