Metric Glossary

Quick definitions and explanations of the various metrics used within the RagaAI Testing Platform

F1Score:

The F1score is the harmonic mean of precision and recall. It provides a balanced assessment of a model's performance by considering both false positives and false negatives.

Averaging Method: Computes a global average by counting the sums of true positive (TP), false negative (FN) and false positive (FP) using Confusion Matrix.

Precision:

Precision is a measure of the accuracy of positive predictions made by the model.

Averaging Method: Computes a global average by counting the sums of true positive (TP) and false positive (FP) using Confusion Matrix.

Recall:

Recall, also known as sensitivity, measures the ability of the model to correctly identify all relevant instances.

Averaging Method: Computes a global average by counting the sums of true positive (TP) and false negative (FN) using Confusion Matrix.

Pixel Accuracy:

Pixel accuracy is a metric that measures the percentage of correctly classified pixels in the segmentation output.

Averaging Method: Computes a global average by counting the sums of true positive (TP), true negative (TN), false negative (FN) and false positive (FP) using Confusion Matrix.

mIoU:

mIoU (Mean Intersection over Union) is a popular metric for semantic segmentation that measures the average intersection over union (IoU) across all classes.

Averaging Method (IoU): Computes a global average by counting the sums of true positive (TP), false negative (FN) and false positive (FP) using Confusion Matrix.

Averaging Method (mIoU): Mean of IoU values for all classes.

wIoU:

wIoU (Weighted Intersection over Union) is a variation of mIoU that assigns different weights to different classes, giving more importance to certain classes in the evaluation.

Mahalanobis Distance:

Mahalanobis distance is a measure of the distance between a data point and a distribution. It is used to assess the similarity of test data to the training data distribution. A higher Mahalanobis distance may indicate that a data point is "out of distribution."

Mistake Score:

Mistake Score is a metric designed to evaluate the quality and accuracy of labelled data within your semantic segmentation datasets. This score serves as a quantitative measure of the labelling errors or inaccuracies that may exist in your dataset.

Distance Score:

Distance Score is used in Drift Detection (ROI) Multi Class cases. It is a metric designed to evaluate the correctness and drift from its respective class. This score serves as a quantitative measure of the classification error or inaccuracies that may exist in your dataset.

Similarity Score:

The similarity score is used to determine the resemblance between two data entities. In the context of super-resolution, it measures how closely a high-resolution image matches its low-resolution counterpart. Scores closer to 1 indicate high similarity, while scores nearing 0 suggest less similarity.

Area Percentage:

The fraction of the area covered by a label in an image, used in semantic segmentation.

BLEU (Bilingual Evaluation Understudy): BLEU is a metric for evaluating a translated text against one or more reference translations. It measures the similarity of the machine translations to human translations, focusing on the precision of n-grams (word sequences of n words) in the translated text.

Cosine Similarity: Cosine Similarity measures the cosine of the angle between two non-zero vectors in a multi-dimensional space. In text analysis, it is often used to determine the similarity between two text documents.

METEOR (Metric for Evaluation of Translation with Explicit Ordering): METEOR is a metric for evaluating machine translation output by considering the alignment between the candidate translation and a reference translation. It extends beyond precision by incorporating synonyms and stemming, and it aims for high recall by penalizing omissions.

ROUGE (Recall-Oriented Understudy for Gisting Evaluation): ROUGE is a set of metrics for evaluating automatic summarization and machine translation. It measures the overlap between the candidate text and reference texts using n-grams, word sequences, and word pair matches.

PreviousA/B Test NextUpload custom model

Last updated 11 months ago

Was this helpful?

F1Score: