Functional Correctness

Objective:

Functional Correctness assesses the accuracy of code generation models by running generated code against a set of predefined test cases. The metric evaluates whether the generated program meets the expected functional requirements by checking pass/fail results on individual test cases, offering a binary and percentage-based measure of correctness.

Required Columns in Dataset:

Generated Program, Set of Test Cases

Interpretation:

  • High Functional Correctness: Indicates that the generated code passes most or all test cases, demonstrating functional accuracy.

  • Low Functional Correctness: Suggests that the generated code fails one or more test cases, highlighting functional discrepancies.

Execution via UI:

Execution via SDK:

metrics=[
    {"name": "Functional Correctness", "schema_mapping": {"generated_program": "Generated Program", "test_cases": "Set of Test Cases"}}
]

Last updated

Was this helpful?