Robust Drop@k
Track failure rates in code attempts with Drop@k. Identify instability in LLM coding performance.

Last updated
Was this helpful?
Track failure rates in code attempts with Drop@k. Identify instability in LLM coding performance.

Last updated
Was this helpful?
Was this helpful?
metrics=[
{"name": "Robust Drop@k", "schema_mapping": {"original_prompt": "Original Prompt", "perturbed_prompts": "Perturbed Prompts", "generated_code": "Generated Code"}}
]