Outlier Detection

The Outlier Detection Test in RagaAI is crucial for identifying anomalies in low-resolution and high-resolution datasets, separately.

Execute Test:

The code snippet provided outlines the process of setting up and executing an Outlier Detection Test in RagaAI, focusing on detecting data points that deviate significantly from the majority of your dataset.

Step 1: Define the Outlier Detection Rules

Begin by establishing the criteria for detecting outliers in your dataset.

rules = DriftDetectionRules()
rules.add(type="anomaly_detection", dist_metric="Mahalanobis", _class="ALL", threshold=0.6)

edge_case_detection = data_drift_detection(test_session=test_session,
                                           test_name="Outlier-detection-test",
                                           dataset_name="super_resolution_data_v3",
                                           embed_col_name="imageEmbedding",
                                           output_type = "super_resolution",
                                           rules = rules)
                                   
test_session.add()

test_session.run()
  • DriftDetectionRules(): Initialises the rules for outlier detection.

  • rules.add(): Adds a rule for detecting anomalies:

    • type: The type of detection, "anomaly_detection" in this case.

    • dist_metric: The distance metric used for detection, "Mahalanobis" here, which is effective for identifying outliers in a multidimensional space.

    • _class: Specifies the class(es) these metrics apply to. "ALL" means all classes in the dataset.

    • threshold: The threshold value for the Mahalanobis distance, with 0.6 being the cut-off for identifying outliers.

  • data_drift_detection(): Configures the outlier detection test with the following parameters:

    • test_session: The session object tied to your RagaAI project.

    • test_name: A name for the test, "Outlier-detection-test" in this case.

    • dataset_name: The name of the dataset you are analysing, "super_resolution_data_v3" here.

    • embed_col_name: The column name in your dataset that contains the embeddings used for analysis.

    • output_type: The type of output expected, "super_resolution" in this context.

    • rules: The previously defined rules for outlier detection.

  • OCRAnomalyRules(): Initialises the rules for outlier detection for OCR Usecases.

  • rules.add(): Adds a rule for detecting anomalies:

    • type: The type of detection, "anomaly_detection" in this case.

    • dist_metric: The distance metric used for detection, "DistsanceMetric" here, which is effective for identifying outliers in a multidimensional space.

    • threshold: The threshold value for the DistanceScore metric, with 0.2 being the cut-off for identifying outliers.

  • ocr_anomaly_test_analysis(): Configures the outlier detection test with the following parameters:

    • test_session: The session object tied to your RagaAI project.

    • test_name: A name for the test, "Outlier-detection-test" in this case.

    • dataset_name: The name of the dataset you are analysing, "ocr_dataset" here.

    • model: Specifies the OCR model used for inferences, "ocr_model" here.

    • type: Specify the usecase, "ocr" here.

    • output_type: For OCR use cases use output_type = "anomaly_detection".

test_session.add(): Registers the outlier detection test within the session.

test_session.run(): Starts the execution of all tests in the session, including the outlier detection test.

By completing these steps, you have initiated an Outlier Detection Test for Super Resolution application in RagaAI.

After the test, carefully review the identified outliers to decide how best to handle them — whether to remove, adjust, or further investigate these data points.

Analysing Test Results

Test Overview

  • Pie Chart: A visual summary showing the proportion of data points that passed or failed the set distance metric threshold.

Distance Score Analysis

  • Bar Graph: Visualise the average distance score for failed data points, with volume details per class.

Interactive Embedding View

  • Visualisation: Use the embedding view to observe outliers.

  • Data Selection: Employ the lasso tool to select specific data points for further examination.

Assessing and Visualising Data

  • Datagrid View: Examine images sorted by their distance score in descending order.

Interpreting Results

  • In Distribution Data: Data points within the threshold are deemed "in distribution" and consistent.

  • Out of Distribution Data: Data points above the threshold are "out of distribution" and warrant further inspection for data consistency.

By adhering to these steps, you can effectively utilise RagaAI to detect and analyse outliers in your super resolution datasets.

Last updated