# Metric Glossary

Quick definitions and explanations of the various metrics used within the RagaAI Testing Platform

The F1score is the harmonic mean of precision and recall. It provides a balanced assessment of a model's performance by considering both false positives and false negatives.

Averaging Method: Computes a global average by counting the sums of true positive (TP), false negative (FN) and false positive (FP) using Confusion Matrix.

**Precision:**

Precision is a measure of the accuracy of positive predictions made by the model.

Averaging Method: Computes a global average by counting the sums of true positive (TP) and false positive (FP) using Confusion Matrix.

**Recall:**

Recall, also known as sensitivity, measures the ability of the model to correctly identify all relevant instances.

Averaging Method: Computes a global average by counting the sums of true positive (TP) and false negative (FN) using Confusion Matrix.

**Pixel Accuracy:**

Pixel accuracy is a metric that measures the percentage of correctly classified pixels in the segmentation output.

Averaging Method: Computes a global average by counting the sums of true positive (TP), true negative (TN), false negative (FN) and false positive (FP) using Confusion Matrix.

**mIoU:**

mIoU (Mean Intersection over Union) is a popular metric for semantic segmentation that measures the average intersection over union (IoU) across all classes.

Averaging Method (IoU): Computes a global average by counting the sums of true positive (TP), false negative (FN) and false positive (FP) using Confusion Matrix.

Averaging Method (mIoU): Mean of IoU values for all classes.

**wIoU:**

wIoU (Weighted Intersection over Union) is a variation of mIoU that assigns different weights to different classes, giving more importance to certain classes in the evaluation.

**Mahalanobis Distance:**

Mahalanobis distance is a measure of the distance between a data point and a distribution. It is used to assess the similarity of test data to the training data distribution. A higher Mahalanobis distance may indicate that a data point is "out of distribution."

**Mistake Score:**

Mistake Score is a metric designed to evaluate the quality and accuracy of labelled data within your semantic segmentation datasets. This score serves as a quantitative measure of the labelling errors or inaccuracies that may exist in your dataset.

**Distance Score:**

Distance Score is used in Drift Detection (ROI) Multi Class cases. It is a metric designed to evaluate the correctness and drift from its respective class. This score serves as a quantitative measure of the classification error or inaccuracies that may exist in your dataset.

**Similarity Score**:

The similarity score is used to determine the resemblance between two data entities. In the context of super-resolution, it measures how closely a high-resolution image matches its low-resolution counterpart. Scores closer to 1 indicate high similarity, while scores nearing 0 suggest less similarity.

**Area Percentage**:

The fraction of the area covered by a label in an image, used in semantic segmentation.

**BLEU (Bilingual Evaluation Understudy)**: BLEU is a metric for evaluating a translated text against one or more reference translations. It measures the similarity of the machine translations to human translations, focusing on the precision of n-grams (word sequences of n words) in the translated text.

**Cosine Similarity:**Cosine Similarity measures the cosine of the angle between two non-zero vectors in a multi-dimensional space. In text analysis, it is often used to determine the similarity between two text documents.

**METEOR (Metric for Evaluation of Translation with Explicit Ordering):**METEOR is a metric for evaluating machine translation output by considering the alignment between the candidate translation and a reference translation. It extends beyond precision by incorporating synonyms and stemming, and it aims for high recall by penalizing omissions.

**ROUGE (Recall-Oriented Understudy for Gisting Evaluation):**ROUGE is a set of metrics for evaluating automatic summarization and machine translation. It measures the overlap between the candidate text and reference texts using n-grams, word sequences, and word pair matches.

Last updated