Drift Detection
The Drift Detection Test allows users to identify shifts between training and field/test datasets
The Drift Detection Test enables you to detect drift between your training dataset and the field/test dataset. By setting a threshold on the distance metric, you can pinpoint out-of-distribution data points
Execute Test:
The code snippet provided is structured to set up and perform a Drift Detection Test, comparing a baseline dataset against a more recent dataset to identify any drift in the data.
Configure the drift detection test using the rules defined above.
Rules
The first step is to establish the criteria for detecting drift in your datasets.
DriftDetectionRules()
: Initialises the rules for drift detection.rules.add()
: Adds a new rule for detecting data drift:type
: The type of drift detection, "anomaly_detection" in this case.dist_metric
: The distance metric to use for detection, "Mahalanobis" which measures the distance between a point and a distribution._class
: Specifies the class(es) these metrics apply to. "ALL" means all classes in the dataset.threshold
: The value above which the distance metric indicates drift.
Initialise Drift Detection Test
data_drift_detection()
: Prepares the drift detection test with the following parameters:
test_session
: The session object linked to your project.test_name
: A descriptive name for this test.train_dataset_name
: The name of the baseline or training dataset.field_dataset_name
: The name of the new or field dataset to compare against the baseline.train_embed_col_name
: The column (schema mapping) in the training dataset that contains embeddings.field_embed_col_name
: The column (schema mapping) in the field dataset that contains embeddings.output_type
: The type of output, "semantic_segmentation" in this instance. Note: Not required for object detection usecases.level
: The level at which to detect drift, "image" means image-level detection.rules
: The previously defined rules for the test.
test_session.add()
: Registers the drift detection test within the session.
test_session.run()
: Initiates the execution of all configured tests in the session, including your drift detection test.
By completing these steps, you've initiated a Drift Detection Test in RagaAI to analyse your datasets for any significant changes in data distribution.
Analysing Test Results
Interpreting the Results
In Distribution Data Points: Identified as "in distribution" if they fall within the set threshold, signifying alignment with the training data.
Out of Distribution Data Points: Labelled as "out of distribution" if they exceed the threshold, suggesting potential drift requiring close examination.
Interactive Embedding View
Visualisation: Use the interactive embedding view to visualise and comprehend the drift between datasets.
Data Selection: Apply the lasso tool within the embedding view to select and scrutinise data points of interest.
Visualising and Assessing Data
Data grid View: Helps visualise annotations with images sorted by mistake scores.
Image View: Delve into detailed analyses of mistake scores for each label, with interactive annotation rendering and original image viewing.
Image View
Information Card: Provides Distance Score of the selected image.
By adhering to these guidelines, you can effectively utilise Drift Detection in RagaAI to maintain the integrity and relevance of your models over time.
Last updated