Robust Pass@k
Last updated
Last updated
Objective:
Robust Pass@k assesses model robustness by evaluating generated code’s ability to pass test cases across multiple perturbations of the input prompt. This metric provides insights into a model’s stability and robustness when faced with variations, enhancing its reliability for critical tasks.
Required Columns in Dataset:
Original Prompt
, Perturbed Prompt
, Generated Code
Interpretation:
High Robust Pass@k: Suggests that the model-generated code maintains functional accuracy across varied prompts, indicating robustness.
Low Robust Pass@k: Reveals potential instability in code generation, as output varies in functional quality across perturbations.
Execution via UI:
Execution via SDK: