BERTScore
Last updated
Last updated
Objective:
BERTScore leverages pre-trained BERT embeddings to compare the similarity between reference and generated text at the token level. Instead of using n-gram matching, it calculates cosine similarity between the embeddings of words, thus capturing deeper semantic meaning and context. This approach provides more flexibility in assessing meaning rather than relying purely on exact token overlap. BERTScore is increasingly used in generative tasks where semantic accuracy is prioritized over exact string matching, such as summarization, dialogue generation, and paraphrasing.
Required Columns in Dataset:
LLM Summary (Embeddings)
, GT Reference (Embeddings)
Interpretation:
High BERTScore: Indicates high semantic similarity between the generated and reference texts, suggesting that the model captures deeper meaning beyond exact word matches.
Low BERTScore: Reflects low semantic alignment, suggesting the generated text diverges significantly from the intended meaning of the reference text.
Execution via UI:
Execution via SDK: