ROUGE

Recall-Oriented Understudy for Gisting Evaluation

Objective:

ROUGE measures the overlap of n-grams, sequences, or words between a machine-generated summary and a reference summary. It includes variants like ROUGE-N (n-gram overlap), ROUGE-L (longest common subsequence), and ROUGE-W (weighted LCS). ROUGE is primarily used for evaluating summarization tasks, focusing on recall by assessing how much relevant content in the reference summary is captured in the generated output. This metric is widely used due to its simplicity and ability to handle various granularities of comparison.

Required Columns in Dataset:

LLM Summary, Reference Document (GT)

Interpretation:

High ROUGE: Indicates strong overlap between the generated text and the reference text, suggesting the system captures relevant content.
Low ROUGE: Suggests poor alignment with the reference text, possibly missing key information or using different phrasing.

Execution via UI:

ROUGE does not require an LLM for computation.

Execution via SDK:

metrics=[
    {"name": "ROUGE", "column_name": "your-text", "schema_mapping": schema_mapping}
]

PreviousQAG Score NextBLEU

Last updated 8 months ago

Was this helpful?