Scenario Imbalance
The Scenario Imbalance Test evaluates the distribution of scenarios or contexts within a dataset, providing insights into potential imbalances that may affect model performance.
Execute Test:
The code executes the Class Imbalance Test using two different metrics, namely Jensen-Shannon Divergence and Chi-Squared Test, to evaluate the distribution of scenarios within a dataset.
Initialize Scenario Imbalance Rules:
Use the
SBRules()
function to initialize the rules for the test.
Add Rules:
Use the
rules.add()
function to add specific rules with the following parameters:metric
: The metric used to evaluate scenario distribution (e.g., js_divergence, chi_squared_test).ideal_distribution
: The ideal distribution assumption for the metric (e.g., "uniform").metric_threshold
: The threshold for the metric, indicating when the scenario distribution is considered imbalanced.
Configure Clustering:
Perform clustering on the dataset to group similar scenarios together using the desired method and parameters.
Use the
clustering()
function with parameters such asmethod
,embedding_col
,level
, andargs
.
Execute Test:
Use the
scenario_imbalance_test()
function to execute the test with the following parameters:test_session
: The session object managing tests.dataset_name
: Name of the dataset to be tested.test_name
: Name of the test run.type
: Type of test, which should be set to "scenario_imbalance".output_type
: Type of output expected from the model.annotation_column_name
: Name of the column containing annotations.rules
: Predefined rules for the test.
Add Test to Session:
Use the
test_session.add()
function to register the test with the test session.
Run Test:
Use the
test_session.run()
function to start the execution of all tests added to the session, including the Scenario Imbalance Test.
By following these steps, you can effectively evaluate scenario distribution within your dataset using the Scenario Imbalance Test.
Interpreting the Results
The Scenario Imbalance Test provides insights into the distribution of scenarios or contexts within a dataset. The results are presented in three segments:
Bar Chart Comparison
The bar chart compares the distribution of scenarios between the training dataset and the dataset under evaluation.
This visualisation highlights any discrepancies in scenario distribution between the two datasets.
Use the bar chart to compare scenario distributions and identify any significant disparities between the training dataset and the dataset under evaluation.
Data Grid View: Helps visualise annotations with images sorted by mistake scores.
Image View: Delve into detailed analyses for each image
By leveraging these features, you can effectively evaluate scenario distribution within your datasets using the Scenario Imbalance Test.
Last updated