Outlier Detection
The Outlier Detection Test in RagaAI is crucial for identifying anomalies in your dataset.
Execute Test:
The code snippet provided outlines the process of setting up and executing an Outlier Detection Test in RagaAI, focusing on detecting data points that deviate significantly from the majority of your dataset.
Initialize Drift Detection Rules:
Use the
DriftDetectionRules()
function to initialize the rules for the test.
Add Rules:
Use the
rules.add()
function to add specific rules with the following parameters:type
: Specifies the type of detection, which should be set to "anomaly_detection" for outlier detection.dist_metric
: The distance metric used for outlier detection (e.g., Mahalanobis)._class
: Specifies the class or label(s) to which the rule applies. Use "ALL" to apply to all classes.threshold
: The threshold for the metric, indicating when a data point is considered an outlier.
Configure Test Run:
Define the test run configuration, including the project name, test name, and session credentials.
Execute Outlier Detection Test:
Use the
data_drift_detection()
function to execute the test with the following parameters:test_session
: The session object managing tests.test_name
: Name of the test run.dataset_name
: Name of the dataset to be tested.embed_col_name
: Name of the column containing embeddings in the dataset.output_type
: Type of output expected from the model, set to "outlier_detection".rules
: Predefined rules for the test.
Add Test to Session:
Use the
test_session.add()
function to register the test with the test session.
Run Test:
Use the
test_session.run()
function to start the execution of all tests added to the session, including the Outlier Detection Test.
By following these steps, you can effectively detect outliers within your dataset using the Outlier Detection Test.
After the test, carefully review the identified outliers to decide how best to handle them — whether to remove, adjust, or further investigate these data points.
Analysing Test Results
Test Overview
Pie Chart: A visual summary showing the proportion of data points that passed or failed the set distance metric threshold.
Distance Score Analysis
Bar Graph: Visualise the average distance score for failed data points, with volume details per class.
Interactive Embedding View
Visualisation: Use the embedding view to observe outliers.
Data Selection: Employ the lasso tool to select specific data points for further examination.
Assessing and Visualising Data
Datagrid View: Examine images sorted by their distance score in descending order.
Interpreting Results
In Distribution Data: Data points within the threshold are deemed "in distribution" and consistent.
Out of Distribution Data: Data points above the threshold are "out of distribution" and warrant further inspection for data consistency.
By adhering to these steps, you can effectively utilise RagaAI to detect and analyse outliers in your datasets.
Last updated