Class Imbalance

The Class Imbalance Test is designed to assess the distribution of classes within a dataset, particularly in the context of machine learning tasks like object detection.

Execute Test:

The code executes the Class Imbalance Test using two different metrics, namely Jensen-Shannon Divergence and Chi-Squared Test, to evaluate the distribution of classes within a training dataset.

rules = ClassImbalanceRules()
rules.add(metric="js_divergence", ideal_distribution="uniform", metric_threshold=0.10, label="ALL")
rules.add(metric="chi_squared_test", ideal_distribution="uniform", metric_threshold=0.10, label="ALL")

run_name = "Class Imbalance v1"
print(run_name)
dataset_name = "training_dataset"

test_session = TestSession(
            project_name="Instance Segmentation",run_name=run_name,access_key="8Sxdx2ELb70quckrkklZ",secret_key="UeIWErIbh8sAFVxpLqtfJA0dMW7QsaiApuRmOYz8",host="https://backend.platform.raga.ai")
        
distribution_test = class_imbalance_test(test_session=test_session,
                                         dataset_name=dataset_name,
                                         test_name=run_name,
                                         type="class_imbalance",
                                         output_type="instance_segmentation",
                                         annotation_column_name="AnnotationsV1",
                                         rules=rules)

test_session.add(distribution_test)
test_session.run()
  1. Initialize Class Imbalance Rules:

    • Use the ClassImbalanceRules() function to initialize the rules for the test.

  2. Add Rules:

    • Use the rules.add() function to add specific rules with the following parameters:

      • metric: The metric used to evaluate class distribution (e.g., js_divergence, chi_squared_test).

      • ideal_distribution: The ideal distribution assumption for the metric (e.g., "uniform").

      • metric_threshold: The threshold for the metric, indicating when the class distribution is considered imbalanced.

      • label: Specifies the label(s) to which the rule applies. Use "ALL" to apply to all labels.

  3. Configure Test Run:

    • Define the test run configuration, including the project name, test name, dataset name, and session credentials.

  4. Execute Class Imbalance Test:

    • Use the class_imbalance_test() function to execute the test with the following parameters:

      • test_session: The session object managing tests.

      • dataset_name: Name of the dataset to be tested.

      • test_name: Name of the test run.

      • type: Type of test, which should be set to "class_imbalance".

      • output_type: Type of output expected from the model.

      • annotation_column_name: Name of the column containing annotations.

      • rules: Predefined rules for the test.

  5. Add Test to Session:

    • Use the test_session.add() function to register the test with the test session.

  6. Run Test:

    • Use the test_session.run() function to start the execution of all tests added to the session, including the Class Imbalance Test.

By following these steps, you can effectively assess the class distribution within your dataset using the Class Imbalance Test.

Interpreting the Results

The Class Imbalance Test provides insights into the distribution of classes within the datasets under evaluation. The results are presented in three segments:

Bar Chart Comparison

  • The bar chart compares the class counts between the training dataset and the dataset under evaluation.

  • This visualisation highlights any discrepancies in class distribution between the two datasets.

Use the bar chart to compare class counts and identify any significant disparities between the training dataset and the dataset under evaluation.

Data Grid View

  • The data grid view presents annotations with images sorted by class imbalance scores.

  • Analyse annotations with images sorted by class imbalance scores in the data grid view.

  • Identify images with notable class imbalances for further investigation.

Image View

  • Interactive annotation rendering and original image viewing capabilities enhance the understanding of class distribution.

  • Utilise the interactive annotation rendering and original image viewing functionalities to examine specific instances of class imbalance.

By leveraging these features, you can effectively evaluate and address class imbalances within your datasets, ensuring the integrity and relevance of your machine learning models over time.

Last updated