Metric Glossary
Quick definitions and explanations of the various metrics used within the RagaAI Testing Platform
The F1score is the harmonic mean of precision and recall. It provides a balanced assessment of a model's performance by considering both false positives and false negatives.
Averaging Method: Computes a global average by counting the sums of true positive (TP), false negative (FN) and false positive (FP) using Confusion Matrix.
Precision:
Precision is a measure of the accuracy of positive predictions made by the model.
Averaging Method: Computes a global average by counting the sums of true positive (TP) and false positive (FP) using Confusion Matrix.
Recall:
Recall, also known as sensitivity, measures the ability of the model to correctly identify all relevant instances.
Averaging Method: Computes a global average by counting the sums of true positive (TP) and false negative (FN) using Confusion Matrix.
Pixel Accuracy:
Pixel accuracy is a metric that measures the percentage of correctly classified pixels in the segmentation output.
Averaging Method: Computes a global average by counting the sums of true positive (TP), true negative (TN), false negative (FN) and false positive (FP) using Confusion Matrix.
mIoU:
mIoU (Mean Intersection over Union) is a popular metric for semantic segmentation that measures the average intersection over union (IoU) across all classes.
Averaging Method (IoU): Computes a global average by counting the sums of true positive (TP), false negative (FN) and false positive (FP) using Confusion Matrix.
Averaging Method (mIoU): Mean of IoU values for all classes.
wIoU:
wIoU (Weighted Intersection over Union) is a variation of mIoU that assigns different weights to different classes, giving more importance to certain classes in the evaluation.
Mahalanobis Distance:
Mahalanobis distance is a measure of the distance between a data point and a distribution. It is used to assess the similarity of test data to the training data distribution. A higher Mahalanobis distance may indicate that a data point is "out of distribution."
Mistake Score:
Mistake Score is a metric designed to evaluate the quality and accuracy of labelled data within your semantic segmentation datasets. This score serves as a quantitative measure of the labelling errors or inaccuracies that may exist in your dataset.
Distance Score:
Distance Score is used in Drift Detection (ROI) Multi Class cases. It is a metric designed to evaluate the correctness and drift from its respective class. This score serves as a quantitative measure of the classification error or inaccuracies that may exist in your dataset.
Similarity Score:
The similarity score is used to determine the resemblance between two data entities. In the context of super-resolution, it measures how closely a high-resolution image matches its low-resolution counterpart. Scores closer to 1 indicate high similarity, while scores nearing 0 suggest less similarity.
Area Percentage:
The fraction of the area covered by a label in an image, used in semantic segmentation.
BLEU (Bilingual Evaluation Understudy): BLEU is a metric for evaluating a translated text against one or more reference translations. It measures the similarity of the machine translations to human translations, focusing on the precision of n-grams (word sequences of n words) in the translated text.
Cosine Similarity: Cosine Similarity measures the cosine of the angle between two non-zero vectors in a multi-dimensional space. In text analysis, it is often used to determine the similarity between two text documents.
METEOR (Metric for Evaluation of Translation with Explicit Ordering): METEOR is a metric for evaluating machine translation output by considering the alignment between the candidate translation and a reference translation. It extends beyond precision by incorporating synonyms and stemming, and it aims for high recall by penalizing omissions.
ROUGE (Recall-Oriented Understudy for Gisting Evaluation): ROUGE is a set of metrics for evaluating automatic summarization and machine translation. It measures the overlap between the candidate text and reference texts using n-grams, word sequences, and word pair matches.
Last updated