Metric Glossary

Quick definitions and explanations of the various metrics used within the RagaAI Testing Platform

  1. F1Score:

The F1score is the harmonic mean of precision and recall. It provides a balanced assessment of a model's performance by considering both false positives and false negatives.

Averaging Method: Computes a global average by counting the sums of true positive (TP), false negative (FN) and false positive (FP) using Confusion Matrix.


  1. Precision:

Precision is a measure of the accuracy of positive predictions made by the model.

Averaging Method: Computes a global average by counting the sums of true positive (TP) and false positive (FP) using Confusion Matrix.


  1. Recall:

Recall, also known as sensitivity, measures the ability of the model to correctly identify all relevant instances.

Averaging Method: Computes a global average by counting the sums of true positive (TP) and false negative (FN) using Confusion Matrix.


  1. Pixel Accuracy:

Pixel accuracy is a metric that measures the percentage of correctly classified pixels in the segmentation output.

Averaging Method: Computes a global average by counting the sums of true positive (TP), true negative (TN), false negative (FN) and false positive (FP) using Confusion Matrix.


  1. mIoU:

mIoU (Mean Intersection over Union) is a popular metric for semantic segmentation that measures the average intersection over union (IoU) across all classes.

Averaging Method (IoU): Computes a global average by counting the sums of true positive (TP), false negative (FN) and false positive (FP) using Confusion Matrix.

Averaging Method (mIoU): Mean of IoU values for all classes.


  1. wIoU:

wIoU (Weighted Intersection over Union) is a variation of mIoU that assigns different weights to different classes, giving more importance to certain classes in the evaluation.


  1. Mahalanobis Distance:

Mahalanobis distance is a measure of the distance between a data point and a distribution. It is used to assess the similarity of test data to the training data distribution. A higher Mahalanobis distance may indicate that a data point is "out of distribution."


  1. Mistake Score:

Mistake Score is a metric designed to evaluate the quality and accuracy of labelled data within your semantic segmentation datasets. This score serves as a quantitative measure of the labelling errors or inaccuracies that may exist in your dataset.


  1. Distance Score:

Distance Score is used in Drift Detection (ROI) Multi Class cases. It is a metric designed to evaluate the correctness and drift from its respective class. This score serves as a quantitative measure of the classification error or inaccuracies that may exist in your dataset.


  1. Similarity Score:

The similarity score is used to determine the resemblance between two data entities. In the context of super-resolution, it measures how closely a high-resolution image matches its low-resolution counterpart. Scores closer to 1 indicate high similarity, while scores nearing 0 suggest less similarity.


  1. Area Percentage:

The fraction of the area covered by a label in an image, used in semantic segmentation.


  1. BLEU (Bilingual Evaluation Understudy): BLEU is a metric for evaluating a translated text against one or more reference translations. It measures the similarity of the machine translations to human translations, focusing on the precision of n-grams (word sequences of n words) in the translated text.


  1. Cosine Similarity: Cosine Similarity measures the cosine of the angle between two non-zero vectors in a multi-dimensional space. In text analysis, it is often used to determine the similarity between two text documents.


  1. METEOR (Metric for Evaluation of Translation with Explicit Ordering): METEOR is a metric for evaluating machine translation output by considering the alignment between the candidate translation and a reference translation. It extends beyond precision by incorporating synonyms and stemming, and it aims for high recall by penalizing omissions.


  1. ROUGE (Recall-Oriented Understudy for Gisting Evaluation): ROUGE is a set of metrics for evaluating automatic summarization and machine translation. It measures the overlap between the candidate text and reference texts using n-grams, word sequences, and word pair matches.


Last updated