Functional Correctness
Objective:
Functional Correctness assesses the accuracy of code generation models by running generated code against a set of predefined test cases. The metric evaluates whether the generated program meets the expected functional requirements by checking pass/fail results on individual test cases, offering a binary and percentage-based measure of correctness.
Required Columns in Dataset:
Generated Program
, Set of Test Cases
Interpretation:
High Functional Correctness: Indicates that the generated code passes most or all test cases, demonstrating functional accuracy.
Low Functional Correctness: Suggests that the generated code fails one or more test cases, highlighting functional discrepancies.
Execution via UI:

Execution via SDK:
metrics=[
{"name": "Functional Correctness", "schema_mapping": {"generated_program": "Generated Program", "test_cases": "Set of Test Cases"}}
]
Last updated
Was this helpful?