Drift Detection

The Drift Detection Test allows users to identify shifts between training and field/test datasets

The Drift Detection Test enables you to detect drift between your training dataset and the field/test dataset. By setting a threshold on the distance metric, you can pinpoint out-of-distribution data points

Execute Test:

The code snippet provided is structured to set up and perform a Drift Detection Test, comparing a baseline dataset against a more recent dataset to identify any drift in the data.

Configure the drift detection test using the rules defined above.

rules = DriftDetectionRules()
rules.add(type="anomaly_detection", dist_metric="Mahalanobis", _class="ALL", threshold=21.0)

edge_case_detection = data_drift_detection(test_session=test_session,
                                           field_embed_col_name = "ImageEmbedding",
                                           output_type = "semantic_segmentation", #not required for object detection usecases
                                           level = "image",
                                           rules = rules)



The first step is to establish the criteria for detecting drift in your datasets.

  • DriftDetectionRules(): Initialises the rules for drift detection.

  • rules.add(): Adds a new rule for detecting data drift:

    • type: The type of drift detection, "anomaly_detection" in this case.

    • dist_metric: The distance metric to use for detection, "Mahalanobis" which measures the distance between a point and a distribution.

    • _class: Specifies the class(es) these metrics apply to. "ALL" means all classes in the dataset.

    • threshold: The value above which the distance metric indicates drift.

Initialise Drift Detection Test

data_drift_detection(): Prepares the drift detection test with the following parameters:

  • test_session: The session object linked to your project.

  • test_name: A descriptive name for this test.

  • train_dataset_name: The name of the baseline or training dataset.

  • field_dataset_name: The name of the new or field dataset to compare against the baseline.

  • train_embed_col_name: The column (schema mapping) in the training dataset that contains embeddings.

  • field_embed_col_name: The column (schema mapping) in the field dataset that contains embeddings.

  • output_type: The type of output, "semantic_segmentation" in this instance. Note: Not required for object detection usecases.

  • level: The level at which to detect drift, "image" means image-level detection.

  • rules: The previously defined rules for the test.

test_session.add(): Registers the drift detection test within the session.

test_session.run(): Initiates the execution of all configured tests in the session, including your drift detection test.

By completing these steps, you've initiated a Drift Detection Test in RagaAI to analyse your datasets for any significant changes in data distribution.

Analysing Test Results

Interpreting the Results

  • In Distribution Data Points: Identified as "in distribution" if they fall within the set threshold, signifying alignment with the training data.

  • Out of Distribution Data Points: Labelled as "out of distribution" if they exceed the threshold, suggesting potential drift requiring close examination.

Interactive Embedding View

  • Visualisation: Use the interactive embedding view to visualise and comprehend the drift between datasets.

  • Data Selection: Apply the lasso tool within the embedding view to select and scrutinise data points of interest.

Visualising and Assessing Data

  • Data grid View: Helps visualise annotations with images sorted by mistake scores.

  • Image View: Delve into detailed analyses of mistake scores for each label, with interactive annotation rendering and original image viewing.

Image View

  • Information Card: Provides Distance Score of the selected image.

By adhering to these guidelines, you can effectively utilise Drift Detection in RagaAI to maintain the integrity and relevance of your models over time.

Last updated