Metric Glossary
Quick definitions and explanations of the various metrics used within the RagaAI Testing Platform
Last updated
Quick definitions and explanations of the various metrics used within the RagaAI Testing Platform
Last updated
The F1score is the harmonic mean of precision and recall. It provides a balanced assessment of a model's performance by considering both false positives and false negatives.
Averaging Method: Computes a global average by counting the sums of true positive (TP), false negative (FN) and false positive (FP) using Confusion Matrix.
Precision:
Precision is a measure of the accuracy of positive predictions made by the model.
Averaging Method: Computes a global average by counting the sums of true positive (TP) and false positive (FP) using Confusion Matrix.
Recall:
Recall, also known as sensitivity, measures the ability of the model to correctly identify all relevant instances.
Averaging Method: Computes a global average by counting the sums of true positive (TP) and false negative (FN) using Confusion Matrix.
Pixel Accuracy:
Pixel accuracy is a metric that measures the percentage of correctly classified pixels in the segmentation output.
Averaging Method: Computes a global average by counting the sums of true positive (TP), true negative (TN), false negative (FN) and false positive (FP) using Confusion Matrix.
mIoU:
mIoU (Mean Intersection over Union) is a popular metric for semantic segmentation that measures the average intersection over union (IoU) across all classes.
Averaging Method (IoU): Computes a global average by counting the sums of true positive (TP), false negative (FN) and false positive (FP) using Confusion Matrix.
Averaging Method (mIoU): Mean of IoU values for all classes.
wIoU:
wIoU (Weighted Intersection over Union) is a variation of mIoU that assigns different weights to different classes, giving more importance to certain classes in the evaluation.
Mahalanobis Distance:
Mahalanobis distance is a measure of the distance between a data point and a distribution. It is used to assess the similarity of test data to the training data distribution. A higher Mahalanobis distance may indicate that a data point is "out of distribution."
Mistake Score:
Mistake Score is a metric designed to evaluate the quality and accuracy of labelled data within your semantic segmentation datasets. This score serves as a quantitative measure of the labelling errors or inaccuracies that may exist in your dataset.
Distance Score:
Distance Score is used in Drift Detection (ROI) Multi Class cases. It is a metric designed to evaluate the correctness and drift from its respective class. This score serves as a quantitative measure of the classification error or inaccuracies that may exist in your dataset.
Similarity Score:
The similarity score is used to determine the resemblance between two data entities. In the context of super-resolution, it measures how closely a high-resolution image matches its low-resolution counterpart. Scores closer to 1 indicate high similarity, while scores nearing 0 suggest less similarity.
Area Percentage:
The fraction of the area covered by a label in an image, used in semantic segmentation.
BLEU (Bilingual Evaluation Understudy): BLEU is a metric for evaluating a translated text against one or more reference translations. It measures the similarity of the machine translations to human translations, focusing on the precision of n-grams (word sequences of n words) in the translated text.
Cosine Similarity: Cosine Similarity measures the cosine of the angle between two non-zero vectors in a multi-dimensional space. In text analysis, it is often used to determine the similarity between two text documents.
METEOR (Metric for Evaluation of Translation with Explicit Ordering): METEOR is a metric for evaluating machine translation output by considering the alignment between the candidate translation and a reference translation. It extends beyond precision by incorporating synonyms and stemming, and it aims for high recall by penalizing omissions.
ROUGE (Recall-Oriented Understudy for Gisting Evaluation): ROUGE is a set of metrics for evaluating automatic summarization and machine translation. It measures the overlap between the candidate text and reference texts using n-grams, word sequences, and word pair matches.