Page cover image

Drift Detection

The Drift Detection Test allows users to identify shifts between training and field/test datasets

The Drift Detection Test enables you to detect drift between your training dataset and the field/test dataset. By setting a threshold on the distance metric, you can pinpoint out-of-distribution data points

Execute Test:

The code snippet provided is structured to set up and perform a Drift Detection Test, comparing a baseline dataset against a more recent dataset to identify any drift in the data.

Configure the drift detection test using the rules defined above.

rules = DriftDetectionRules()
rules.add(type="drift_detection", dist_metric="Mahalanobis", _class="ALL", threshold=2)

edge_case_detection = data_drift_detection(test_session=test_session,
                                           test_name=f"Drift-Detection-Test",
                                           train_dataset_name="bdd_train_dataset",
                                           field_dataset_name="bdd_field_dataset",
                                           train_embed_col_name="ImageVectorsM1",
                                           field_embed_col_name = "ImageVectorsM1",
                                           level = "image",
                                           rules = rules)


test_session.add(edge_case_detection)

test_session.run()

Rules

The first step is to establish the criteria for detecting drift in your datasets.

  • DriftDetectionRules(): Initialises the rules for drift detection.

    • rules.add(): Adds a new rule for detecting data drift:

      • type: The type of drift detection, "anomaly_detection" in this case.

      • dist_metric: The distance metric to use for detection, "Mahalanobis" which measures the distance between a point and a distribution.

      • _class: Specifies the class(es) these metrics apply to. "ALL" means all classes in the dataset.

      • threshold: The value above which the distance metric indicates drift.

Initialise Drift Detection Test

  • data_drift_detection(): Prepares the drift detection test with the following parameters:

    • test_session: The session object linked to your project.

    • test_name: A descriptive name for this test.

    • train_dataset_name: The name of the baseline or training dataset.

    • field_dataset_name: The name of the new or field dataset to compare against the baseline.

    • train_embed_col_name: The column (schema mapping) in the training dataset that contains embeddings.

    • field_embed_col_name: The column (schema mapping) in the field dataset that contains embeddings.

    • level: The level at which to detect drift, "image" means image-level detection.

    • rules: The previously defined rules for data drift detection test

    • test_session.add(): Registers the drift detection test within the session.

    • test_session.run(): Initiates the execution of all configured tests in the session, including your drift detection test.

By completing these steps, you've initiated a Drift Detection Test on the RagaAI Testing Platform to analyse your datasets for any significant changes in data distribution.

Analysing Test Results

Interpreting the Results

  • In Distribution Data Points: Identified as "in distribution" if they fall within the set threshold, signifying alignment with the training data.

  • Out of Distribution Data Points: Labelled as "out of distribution" if they exceed the threshold, suggesting potential drift requiring close examination.

Interactive Embedding View

  • Visualisation: Use the interactive embedding view to visualise and comprehend the drift between datasets.

  • Data Selection: Apply the lasso tool within the embedding view to select and scrutinise data points of interest.

Visualising and Assessing Data

  • Data grid View: Helps visualise images sorted by field dataset (out of distribution and in distribution datapoints) along with the training dataset.

  • Image View: Delve into detailed analyses of mistake scores for each label, with interactive annotation rendering and original image viewing.

Image View

  • Information Card: Provides the name of the datapoint .

By adhering to these guidelines, you can effectively utilise Drift Detection in RagaAI to maintain the integrity and relevance of your models over time.

Last updated

Was this helpful?