Executing Evaluations

Run various cutting edge evaluation metrics out-of-the-box with a few simple steps

Via UI:

1. Adding an Evaluation

  • Navigate to your dataset and click the "Evaluate" button to begin configuring your evaluation.

2. Selecting a Metric

  • Choose the metric you want to run on the dataset from the available options.

3. Naming the Metric

  • Enter a unique metric name to identify this evaluation. This will help you track the column name .

4. Configure the parameters

  • Choose the model you want to use for running the evaluation. You can select from pre-configured models within the platform or use a custom gateway (described here) to perform the evaluations.

  • In case you have configured your own gateway, you should see a "custom_gateway" option in the model selection dropdown, which can be selected and used.

  • Map the selected metric to the appropriate column names in your dataset.

    • Ensure that each metric is correctly aligned with the corresponding data columns to ensure accurate evaluation.

5. Threshold

  • User can configure the passing criteria for each metric to define the passed and failed datapoints. Users can re-configure the threshold once they have been calculated from the UI using the ⚙️ icon beside the metric column name.

  • Click on "Update Threshold" to update.

6. Applying Filters (Optional)

  • Optionally, you can apply filters to narrow down the data points for the evaluation.

    • This is useful if you want to evaluate a specific subset of your dataset.

7. Saving the Configuration

  • Once the metric, model, and filters are configured, click "Save" to save your evaluation setup.

8. Configuring Multiple Evaluations

  • Repeat the steps above to configure multiple evaluations if needed. This allows you to run several evaluations on the dataset simultaneously.

9. Running the Evaluations

  • Once all evaluations are set up, click "Evaluate" to execute the evaluations for all the configured metrics.

Via SDK:

You can also run metrics using the following commands:

from ragaai_catalyst import Evaluation
evaluation = Evaluation(project_name="your-project-name",
                        dataset_name="your-dataset-name")

evaluation.list_metrics() #List available metrics

#Define schema mapping for all metrics to be run
schema_mapping={
    'Query': 'prompt',
    'Response': 'response',
    'Context': 'context',
    'ExpectedResponse': 'expected_response'
}

#List metrics to be run
metrics = [
    {"name": "Hallucination", "config": {"model": "gemini-1.5", "provider": "gemini"}, "column_name": "Hallucination_v1", "schema_mapping": schema_mapping},
    {"name": "Response Correctness", "config": {"model": "gpt-4o-mini", "provider": "openai"}, "column_name": "Response_Correctness_v1", "schema_mapping": schema_mapping},
    {"name": "Toxicity", "config": {"model": "gpt-4o-mini", "provider": "openai"}, "column_name": "Toxicity_v1", "schema_mapping": schema_mapping}
    ]
    
#Trigger listed metrics to run
evaluation.add_metrics(metrics=metrics)

The schema mapping above follows a format similar to "key":"value" representations, with "key" representing the column names in your dataset, and "value" representing a Catalyst Schema definition variable (pre-defined). Here is a list of all supported schema variables (case-sensitive):

  • prompt

  • context

  • response

  • expected_response

  • expected_context

  • traceId

  • timestamp

  • metadata

  • pipeline

  • cost

  • feedBack

  • latency

  • system_prompt

  • traceUri

In case you have enabled a custom gateway, the above metric evaluation configuration will be edited for your model's details as follows:

{"name": "Hallucination", "config": {"model": "your-model-name", "provider": "your-model-provider"}, "column_name": "Hallucination_v1", "schema_mapping": schema_mapping}

Once evaluations have been triggered, they can be tracked and accessed as follows:

#Get status
evaluation.get_status()

#View Results
df = evaluation.get_results()
df.head()

Last updated