Active Learning

The Active Learning Test in RagaAI optimises dataset by selecting the most representative data points within a specified budget.

Execute Test:

The code snippet below outlines the steps to configure and execute an Active Learning Test, which intelligently selects the most representative samples from your dataset.

Step 1: Define Active Learning Parameters

Start by specifying the parameters that will guide the Active Learning process.

active_learning_test = active_learning(test_session=test_session,
                                      dataset_name = dataset_name,
                                      test_name = "active_learning_5",
                                      type = "active_learning",
                                      output_type="curated_dataset",
                                      embed_col_name="hr_embedding",
                                      budget=budget)
                                      
test_session.add(active_learning_test)

test_session.run()

active_learning(): Initialises the Active Learning Test.
test_session: The session object associated with your RagaAI project.
dataset_name: The name of the dataset you're looking to optimize.
test_name: A name for this specific test iteration, such as "active_learning_5".
type: The type of test, which is "active_learning" in this case.
output_type: The expected result of the test, "curated_dataset"
embed_col_name: The column name in your dataset that contains the embeddings, here it is "hr_embedding" which suggests high-resolution image embeddings.
budget: A variable that defines the limit on the number of data points to select.

test_session.add(): Registers the Active Learning Test with the session.

test_session.run(): Triggers the execution of all tests that have been added to the session, including your Active Learning Test.

By completing the steps above, you have successfully initiated an Active Learning Test in RagaAI.

Analysing Test Results

Classification: Each data point is marked as 'included' or 'excluded' based on its representativeness.
Inclusion: The algorithm aims to include as many data points as your budget allows.

Reviewing Results

Embedding View: Assess the overall dataset distribution and selected points.
Datagrid View: Browse through individual data points categorised by the test.

Image Analysis and Adjustment

Image View: Inspect specific images and their similar counterparts.
Manual Adjustment: Use the include/exclude option to refine your dataset manually.

Finalising Dataset

Export: Once satisfied, export the revised dataset as a CSV file.

This test will allow you to maximise the value of your dataset by ensuring that the most informative data points are used within the constraints of your specified budget.

PreviousSemantic Similarity NextNear Duplicates Detection

Last updated 1 year ago

Was this helpful?