METEOR
Metric for Evaluation of Translation with Explicit ORdering
Last updated
Metric for Evaluation of Translation with Explicit ORdering
Last updated
Objective:
METEOR was designed to address some shortcomings of BLEU, particularly for machine translation. It evaluates machine-generated text by considering synonyms, stemming, and word order, giving higher importance to meaning and linguistic structure. METEOR uses precision, recall, and an F1-score, with additional weighting based on semantic similarity and word alignment, making it more flexible for tasks that require nuanced linguistic evaluations. It is popular for translation evaluation but can be extended to other text generation tasks.
Required Columns in Dataset:
LLM Summary
, Reference Document (GT)
Interpretation:
High METEOR: Reflects good alignment, considering synonyms, stemming, and word order, meaning the generated output is both semantically and syntactically aligned with the reference.
Low METEOR: Implies limited semantic matching, with possible differences in vocabulary, ordering, or inability to capture linguistic variations.
Execution via UI:
METEOR does not require an LLM for computation.
Execution via SDK: