Failure Mode Analysis

Failure Mode Analysis enables you to define rules on metrics by setting thresholds. Based on rules, get a sorted list of all clusters that are breaching the threshold and clusters that are within the threshold. RagaAI will create clusters using the embeddings of the Dataset. Clusters that underperform and breaches the threshold will be highlighted on the Issue Stats screen.

Execute Test

The following code snippet is designed to perform a Failure Mode Analysis test on Language Models (LLMs) within the RagaAI environment.

First, you define the rules that will be used to evaluate the Image Classification model performance based on various metrics such as Precision, Recall and F1Score. These rules help in assessing the model's performance

This code snippet is for executing a Failure Mode Analysis (FMA) Test for an Image Classification model. Let's break down each part of the code:

rules = FMARules()
rules.add(metric="Precision", conf_threshold=0.5, metric_threshold=0.5, label="live")

cls_default = clustering(test_session=test_session,
                         dataset_name = "phase2-live-dataset",
                         method="k-means",
                         embedding_col="ImageVectorsM1",
                         level="image",
                         args= {"numOfClusters": 5})

edge_case_detection = failure_mode_analysis(test_session=test_session,
                                            dataset_name = "ic_fma_testdata",
                                            test_name = "Failure Mode Analysis",
                                            model = "modelB",
                                            gt = "GT",
                                            type="embeddings",
                                            clustering = cls_default,
                                            rules = rules,
                                            output_type="multi-label")

test_session.add(edge_case_detection)
test_session.run()
  • rules = FMARules():Initialises the rules for the FMA test specifically tailored for image classification.

    • Adds a rule for evaluating the model's performance.

    • metric: Specifies the performance metric to evaluate, here it is "Precision".

    • conf_threshold: The confidence threshold for predictions. Predictions with a confidence score below this threshold are likely considered invalid or uncertain.

    • metric_threshold: The minimum acceptable value for the specified metric. Here, the precision must be at least 0.5.

    • label: Specifies the label to which this rule applies. In this case, the rule is specific to the "live" label in a multi-label classification scenario.

  • clustering(): Configures the clustering parameters for grouping similar failure modes.

    • dataset_name: The name of the dataset to be used for the clustering process.

    • method: The clustering technique used; here it is "k-means".

    • embedding_col: The column containing embedding vectors of images to be used for clustering.

    • level: Specifies the level at which clustering is applied; "image" indicates that each image is treated as an individual data point.

    • args: Additional arguments for the clustering method, like the number of clusters to form.

  • failure_mode_analysis()

    • Sets up the failure mode analysis for the image classification model.

    • dataset_name: The name of the dataset for the FMA test.

    • test_name: A unique identifier for this test.

    • model: The identifier of the image classification model being tested, here "modelB".

    • gt: Refers to "ground truth", enter the ground truth in schema mapping

    • type: The type of analysis, here it's based on "embeddings".

    • clustering: The clustering configuration defined earlier.

    • rules: The set of evaluation rules defined at the beginning.

    • output_type: Indicates the type of classification, which is "multi-label" in this case.

  • test_session.add(): Registers the labelling quality test with the session.

  • test_session.run(): Starts the execution of all tests in the session, including your labelling quality test.

This setup in RagaAI helps you effectively analyse and identify failure modes in your Image Classification Model, providing valuable insights for model improvement and refinement."

Analysing Test Results

Clustering (Embedding View)

  1. Navigate to Embedding View: Find the interactive visualisation interface in RagaAI.

  2. Visualise Clusters: Utilise the tool to visualise complex clustering, uncovering hidden patterns and structures in the data.

  3. Select Data Points: Use the lasso tool to select specific data points of interest and analyse their results.

Visualising Data

  1. Grid View: Access the grid view to see data points within the selected clusters.

  2. Data Filtering: Use this feature to focus on specific subsets of your dataset that meet certain conditions, helping to extract meaningful patterns and trends.

Understanding Clustering and Threshold Breaches

  • Cluster Analysis: RagaAI creates clusters using the embeddings of the dataset. These clusters group similar data points together.

  • Identifying Underperforming Clusters: Clusters that underperform (breaching the threshold) will be highlighted on the Issue Stats screen.

  • Directly Look at Problematic Clusters: Users can quickly identify clusters responsible for underperformance and assess their impact on the overall model.

  • In-Depth Analysis: Dive deeper into specific clusters or data points to understand the root causes of underperformance.

Data Analysis

  1. Switch to Analysis Tab: To get a detailed performance report, go to the Analysis tab.

  2. View Performance Metrics: Examine metrics like label-wise performance and temporal graphs.

  3. Confusion Matrix: The class-based confusion matrix in Failure Mode Analysis provides a detailed breakdown of performance for each class. Users can view the confusion matrix in three ways:

    • Absolute: Shows the absolute number of pixels.

    • Normalised (Ground Truth): Normalises values with respect to the ground truth.

    • Normalised (Model): Normalises values with respect to model inference.

Practical Tips

  • Set Realistic Thresholds: Choose thresholds that reflect the expected performance of your model.

  • Leverage Visual Tools: Make full use of RagaAI’s visualisation capabilities to gain insights that might not be apparent from raw data alone.

By following these steps, users can efficiently leverage the Failure Mode Analysis test to gain a comprehensive understanding of their model's performance, identify key areas for improvement, and make data-driven decisions to enhance model accuracy and reliability.

Last updated