Functional Correctness
Last updated
Last updated
Objective:
Functional Correctness assesses the accuracy of code generation models by running generated code against a set of predefined test cases. The metric evaluates whether the generated program meets the expected functional requirements by checking pass/fail results on individual test cases, offering a binary and percentage-based measure of correctness.
Required Columns in Dataset:
Generated Program
, Set of Test Cases
Interpretation:
High Functional Correctness: Indicates that the generated code passes most or all test cases, demonstrating functional accuracy.
Low Functional Correctness: Suggests that the generated code fails one or more test cases, highlighting functional discrepancies.
Execution via UI:
Execution via SDK: