Page cover image

OCR

This page provides examples of how RagaAI's Testing Platform can add value to teams building OCR models and pipelines. It is a companion piece to the Product Demo available on the RagaAI Platform.

The OCR Project on the sample workspace is an example of how the RagaAI Testing Platform can help with the following tasks -

  • Data Quality Checks before training a new model

  • Model Quality Checks to identify performance gaps and perform regression analysis

  • End-to-end pipeline level tests beyond AI models

The RagaAI Testing Platform is designed to add science to the art of detection AI issues, performing root cause analysis and providing actionable recommendations. This is done as an automated suite of tests on the platform.

An overview of all tests for the sample project is available here -

1. Outlier Detection

Detecting outliers for OCR data on the RagaAI Testing Platform

Goal - Identify scenarios in the field data which are drastically different (out-of-distribution) with respect to the training dataset. The AI model is prone to generating erroneous predictions on such datapoints.

Methodology - RagaAI automatically detection OOD datapoints using the embeddings from the RagaAI DNA technology

Insight - For this case, we see that the platform identifies data drift for images which are rotated or have different lighting conditions given the model has only been trained on portrait images.

Impact - This automated test helps users access if the data in the production setting has shifted and the model needs to be retrained.

For more details, please refer to the detailed outlier detection documentation.

2. Missing Detections

Detecting missing values in OCR data on the RagaAI Testing Platform

Goal - Identify label drift among the model predictions

Methodology - RagaAI automatically detection OOD datapoints using the embeddings from the RagaAI DNA technology

Insight - For this case, we see that the platform identifies label drift for images which are rotated or have different lighting conditions given the model has only been trained on portrait images.

Impact - This automated test helps users access if the label distribution in the production setting has shifted and the model needs to be retrained.

For more details, please refer to the detailed missing values documentation.

Last updated