Robust Pass@k

Objective:

Robust Pass@k assesses model robustness by evaluating generated code’s ability to pass test cases across multiple perturbations of the input prompt. This metric provides insights into a model’s stability and robustness when faced with variations, enhancing its reliability for critical tasks.

Required Columns in Dataset:

Original Prompt, Perturbed Prompt, Generated Code

Interpretation:

  • High Robust Pass@k: Suggests that the model-generated code maintains functional accuracy across varied prompts, indicating robustness.

  • Low Robust Pass@k: Reveals potential instability in code generation, as output varies in functional quality across perturbations.

Execution via UI:

Execution via SDK:

metrics=[
    {"name": "Robust Pass@k", "schema_mapping": {"original_prompt": "Original Prompt", "perturbed_prompts": "Perturbed Prompts", "generated_code": "Generated Code"}}
]

Last updated

Was this helpful?