# Outlier Detection

### Execute Test:

The code snippet provided outlines the process of setting up and executing an Outlier Detection Test in RagaAI, focusing on detecting data points that deviate significantly from the majority of your dataset.

```python
rules = DriftDetectionRules()
rules.add(type="anomaly_detection", dist_metric="Mahalanobis", _class="ALL", threshold=25)
edge_case_detection = data_drift_detection(test_session=test_session,
                                                   test_name=run_name,
                                                   dataset_name="storefront_dataset",
                                                   embed_col_name = "Embedding",
                                                   output_type = "outlier_detection",
                                                   rules = rules)
test_session.add(edge_case_detection)
test_session.run()
```

1. **Initialize Drift Detection Rules**:
   * Use the `DriftDetectionRules()` function to initialize the rules for the test.
2. **Add Rules**:
   * Use the `rules.add()` function to add specific rules with the following parameters:
     * `type`: Specifies the type of detection, which should be set to "anomaly\_detection" for outlier detection.
     * `dist_metric`: The distance metric used for outlier detection (e.g., Mahalanobis).
     * `_class`: Specifies the class or label(s) to which the rule applies. Use "ALL" to apply to all classes.
     * `threshold`: The threshold for the metric, indicating when a data point is considered an outlier.
3. **Configure Test Run**:
   * Define the test run configuration, including the project name, test name, and session credentials.
4. **Execute Outlier Detection Test**:
   * Use the `data_drift_detection()` function to execute the test with the following parameters:
     * `test_session`: The session object managing tests.
     * `test_name`: Name of the test run.
     * `dataset_name`: Name of the dataset to be tested.
     * `embed_col_name`: Name of the column containing embeddings in the dataset.
     * `output_type`: Type of output expected from the model, set to "outlier\_detection".
     * `rules`: Predefined rules for the test.
5. **Add Test to Session**:
   * Use the `test_session.add()` function to register the test with the test session.
6. **Run Test**:
   * Use the `test_session.run()` function to start the execution of all tests added to the session, including the Outlier Detection Test.

By following these steps, you can effectively detect outliers within your dataset using the Outlier Detection Test.

After the test, carefully review the identified outliers to decide how best to handle them — whether to remove, adjust, or further investigate these data points.

### Analysing Test Results

#### Test Overview

* **Pie Chart**: A visual summary showing the proportion of data points that passed or failed the set distance metric threshold.

#### Distance Score Analysis

* **Bar Graph**: Visualise the average distance score for failed data points, with volume details per class.

<figure><img src="https://1811327582-files.gitbook.io/~/files/v0/b/gitbook-x-prod.appspot.com/o/spaces%2FYbIiNdp1QbG4avl7VShw%2Fuploads%2FuusLJokw8m0ZGFqW7B5w%2FScreenshot%202024-01-12%20at%2016.54.42.png?alt=media&#x26;token=0062df79-b9b5-4317-9f18-a014ce2a33dc" alt=""><figcaption><p><strong>Note:</strong> Embeddings may not be present in case of OCR Use cases. </p></figcaption></figure>

#### Interactive Embedding View

* **Visualisation**: Use the embedding view to observe outliers.
* **Data Selection**: Employ the lasso tool to select specific data points for further examination.

<figure><img src="https://lh7-us.googleusercontent.com/uW-yM5X6qlPIOP07O-VflUpWK1GHS7QyG0Ft2brNE_SiWgLXzDzc_LjduFnnHaP69hdyoOvwWCCLI32jeMmSmYI-0TmBjP2ruxU3dqTYWR7GepxOQAO9ocKfDKiRzT2osRTe0klUnlNjyO06Mn79ris5MnoIgtqpFlf7t4GGT0ok1B9kG1Tz5a061Ehs9A" alt=""><figcaption></figcaption></figure>

#### Assessing and Visualising Data

* **Datagrid View**: Examine images sorted by their distance score in descending order.

#### Interpreting Results

* **In Distribution Data**: Data points within the threshold are deemed "in distribution" and consistent.
* **Out of Distribution Data**: Data points above the threshold are "out of distribution" and warrant further inspection for data consistency.<br>

By adhering to these steps, you can effectively utilise RagaAI to detect and analyse outliers in your datasets.
