Drift Detection
The Drift Detection Test allows users to identify shifts between training and field/test datasets
The Drift Detection Test enables you to detect drift between your training dataset and the field/test dataset. By setting a threshold on the distance metric, you can pinpoint out-of-distribution data points
Execute Test:
The code snippet provided is structured to set up and perform a Drift Detection Test, comparing a baseline dataset against a more recent dataset to identify any drift in the data.
Configure the drift detection test using the rules defined above.
rules = DriftDetectionRules()
rules.add(type="drift_detection", dist_metric="Mahalanobis", _class="ALL", threshold=2)
edge_case_detection = data_drift_detection(test_session=test_session,
test_name=f"Drift-Detection-Test",
train_dataset_name="bdd_train_dataset",
field_dataset_name="bdd_field_dataset",
train_embed_col_name="ImageVectorsM1",
field_embed_col_name = "ImageVectorsM1",
level = "image",
rules = rules)
test_session.add(edge_case_detection)
test_session.run()
Rules
The first step is to establish the criteria for detecting drift in your datasets.
DriftDetectionRules()
: Initialises the rules for drift detection.rules.add()
: Adds a new rule for detecting data drift:type
: The type of drift detection, "anomaly_detection" in this case.dist_metric
: The distance metric to use for detection, "Mahalanobis" which measures the distance between a point and a distribution._class
: Specifies the class(es) these metrics apply to. "ALL" means all classes in the dataset.threshold
: The value above which the distance metric indicates drift.
Initialise Drift Detection Test
data_drift_detection()
: Prepares the drift detection test with the following parameters:test_session
: The session object linked to your project.test_name
: A descriptive name for this test.train_dataset_name
: The name of the baseline or training dataset.field_dataset_name
: The name of the new or field dataset to compare against the baseline.train_embed_col_name
: The column (schema mapping) in the training dataset that contains embeddings.field_embed_col_name
: The column (schema mapping) in the field dataset that contains embeddings.level
: The level at which to detect drift, "image" means image-level detection.rules
: The previously defined rules for data drift detection testtest_session.add()
: Registers the drift detection test within the session.test_session.run()
: Initiates the execution of all configured tests in the session, including your drift detection test.
By completing these steps, you've initiated a Drift Detection Test on the RagaAI Testing Platform to analyse your datasets for any significant changes in data distribution.
Analysing Test Results
Interpreting the Results
In Distribution Data Points: Identified as "in distribution" if they fall within the set threshold, signifying alignment with the training data.
Out of Distribution Data Points: Labelled as "out of distribution" if they exceed the threshold, suggesting potential drift requiring close examination.
Interactive Embedding View
Visualisation: Use the interactive embedding view to visualise and comprehend the drift between datasets.
Data Selection: Apply the lasso tool within the embedding view to select and scrutinise data points of interest.

Visualising and Assessing Data
Data grid View: Helps visualise images sorted by field dataset (out of distribution and in distribution datapoints) along with the training dataset.
Image View: Delve into detailed analyses of mistake scores for each label, with interactive annotation rendering and original image viewing.

Image View

Information Card: Provides the name of the datapoint .
By adhering to these guidelines, you can effectively utilise Drift Detection in RagaAI to maintain the integrity and relevance of your models over time.
Last updated
Was this helpful?