Page cover image

A/B Test

A/B Test is a systematic method to compare two or more variations of a pipeline.

A/B Test helps users make informed decisions about pipeline improvements, feature engineering methodologies, and any changes to the underlying data processing pipeline. It aids in determining the most effective solutions for improving pipeline performance and their applicability to various scenarios. A/B Test is also useful in determining the robustness of pipelines to changes in data distribution over time, assuring adaptability to changing real-world situations.

Execute Test

The following code snippet is designed to perform a Failure Mode Analysis test on a specified dataset within the RagaAI environment.

First, you define the rules that will be used to evaluate the model's performance. These rules are based on metrics such as Difference Percentage (Refer Metric Glossary).

rules = EventABTestRules()
rules.add(metric="difference_percentage", IoU=0.5, _class="ALL", threshold=50.0, conf_threshold=0.2)

filters = Filter()
filters.add(TimestampFilter(gte="2021-01-01T00:00:00Z", lte="2025-01-15T00:00:00Z"))

model_comparison_check = event_ab_test(test_session=test_session,
                                               dataset_name="bdd_video_test_1",
                                               test_name=f"AB-test-unlabelled-{datetime.datetime.now().strftime('%Y%m%d%H%M%S')}",
                                               modelB="Red_Light_V2",
                                               modelA="Red_Light_V1",
                                               object_detection_modelB="Red_Light_V2",
                                               object_detection_modelA="Red_Light_V1",
                                               type="metadata",
                                               sub_type="unlabelled",
                                               output_type="event_detection",
                                               rules=rules,
                                               aggregation_level=["weather"],
                                               filter=filters)

test_session.add(model_comparison_check)
test_session.run()

  • EventABTestRules(): Initialises the rules for the A/B Test.

    • rules.add(): Adds a new rule with specific parameters:

      • metric: The performance metric to evaluate (e.g., Difference Percentage).

      • iou: This sets the required degree of overlap between consecutive frames' bounding boxes, influencing the sensitivity of object tracking in applications.

      • _class: Specifies the class these metrics apply to. "ALL" means all classes.

      • threshold: The minimum acceptable value for the metric.

      • conf_threshold: The minimum acceptable confidence value for a detection to be true.

  • EventABTestRules(): Initialises the rules for the A/B Test.

    • filters.add(): Adds a new filter with specific parameters:

      • TimestampFilter: Specifies the date and time range for the datapoints around which the test results are generated.

  • event_ab_test(): Configures the A/B Test with the following parameters:

    • test_session: Defines the test session created by the user with the project name, access key, secret key and the host token.

    • dataset_name: Specifies the dataset to be used by the user for the test.

    • test_name: Identifies with the test the user is running.

    • modelB: Contains the name of Pipeline B (second pipeline)

    • modelA: Contains the name of Pipeline A (first pipeline)

    • object_detection_modelB: This is the intermediate Model B object detection bounding box rendering for the videos.

    • object_detection_modelA: These are the intermediate Model A object detection bounding box rendering for the videos.

    • type: Specifies the level (embedding for cluster level and metadata for metadata level) on which the test is being run.

    • sub_type: Specifies the category of test (labelled or unlabelled).

    • output_type: Contains the usecase the test is being run on. For example: objecy_detection, event_detection and many more.

    • aggregation_level: Specifies the scenarios across which the test results are generated. This is only required incase failure mode analysis is being run at a metadata level.

    • rules: The previously defined rules for A/B Test.

    • filter: The previously defined filters for A/B Test test.

  • test_session.add(): Registers the failure mode analysis test with the test session.

  • test_session.run(): Starts the execution of all the tests added to the session, including the A/B Test.

Following this guide, you've successfully set up and initiated a A/B Test on the RagaAI Testing Platform.

Analysing Test Results

  • Directly Look at Problematic Scenarios: Users can quickly identify where the divergence is the most and pin point cases where a pipeline is underperforming.

  • In-Depth Analysis: Dive deeper into specific scenarios or data points to understand the root causes of underperformance.

Data Analysis

  1. Switch to Analysis Tab: To get a detailed performance report, go to the Analysis tab.

  2. View Performance Metrics: Examine metrics like Event A/B Detections and Detections over time.

  3. Data Grid View: Users can drill down into individual datapoints and analyse results at a granular level.

Practical Tips

  • Set Realistic Thresholds: Choose thresholds that reflect the expected performance of your model.

  • Leverage Visual Tools: Make full use of RagaAI’s visualisation capabilities to gain insights that might not be apparent from raw data alone.

By following these steps, users can efficiently leverage the Failure Mode Analysis test to gain a comprehensive understanding of their model's performance, identify key areas for improvement, and make data-driven decisions to enhance model accuracy and reliability.

Last updated