Executing Evaluations

Run various cutting edge evaluation metrics out-of-the-box with a few simple steps

Via UI:

1. Adding an Evaluation

Navigate to your dataset and click the "Evaluate" button to begin configuring your evaluation.

2. Selecting a Metric

Choose the metric you want to run on the dataset from the available options.

3. Naming the Metric

Enter a unique metric name to identify this evaluation. This will help you track the column name .

4. Configure the parameters

Choose the model you want to use for running the evaluation. You can select from pre-configured models within the platform or use a custom gateway (described here) to perform the evaluations.
In case you have configured your own gateway, you should see a "custom_gateway" option in the model selection dropdown, which can be selected and used.
Map the selected metric to the appropriate column names in your dataset.
- Ensure that each metric is correctly aligned with the corresponding data columns to ensure accurate evaluation.

5. Threshold

User can configure the passing criteria for each metric to define the passed and failed datapoints. Users can re-configure the threshold once they have been calculated from the UI using the ⚙️ icon beside the metric column name.
Click on "Update Threshold" to update.

6. Applying Filters (Optional)

Optionally, you can apply filters to narrow down the data points for the evaluation.
- This is useful if you want to evaluate a specific subset of your dataset.

7. Saving the Configuration

Once the metric, model, and filters are configured, click "Save" to save your evaluation setup.

8. Configuring Multiple Evaluations

Repeat the steps above to configure multiple evaluations if needed. This allows you to run several evaluations on the dataset simultaneously.

9. Running the Evaluations

Once all evaluations are set up, click "Evaluate" to execute the evaluations for all the configured metrics.

Via SDK:

You can also run metrics using the following commands:

from ragaai_catalyst import Evaluation
evaluation = Evaluation(project_name="your-project-name",
                        dataset_name="your-dataset-name")

evaluation.list_metrics() #List available metrics

#Define schema mapping for all metrics to be run
schema_mapping={
    'Query': 'prompt',
    'Response': 'response',
    'Context': 'context',
    'ExpectedResponse': 'expected_response'
}

#List metrics to be run
metrics = [
    {"name": "Hallucination", "config": {"model": "gemini-1.5", "provider": "gemini"}, "column_name": "Hallucination_v1", "schema_mapping": schema_mapping},
    {"name": "Response Correctness", "config": {"model": "gpt-4o-mini", "provider": "openai"}, "column_name": "Response_Correctness_v1", "schema_mapping": schema_mapping},
    {"name": "Toxicity", "config": {"model": "gpt-4o-mini", "provider": "openai"}, "column_name": "Toxicity_v1", "schema_mapping": schema_mapping}
    ]
    
#Trigger listed metrics to run
evaluation.add_metrics(metrics=metrics)

The schema mapping above follows a format similar to "key":"value" representations, with "key" representing the column names in your dataset, and "value" representing a Catalyst Schema definition variable (pre-defined). Here is a list of all supported schema variables (case-sensitive):

prompt
context
response
expected_response
expected_context
traceId
timestamp
metadata
pipeline
cost
feedBack
latency
system_prompt
traceUri

In case you have enabled a custom gateway, the above metric evaluation configuration will be edited for your model's details as follows:

{"name": "Hallucination", "config": {"model": "your-model-name", "provider": "your-model-provider"}, "column_name": "Hallucination_v1", "schema_mapping": schema_mapping}

Once evaluations have been triggered, they can be tracked and accessed as follows:

#Get status
evaluation.get_status()

#View Results
df = evaluation.get_results()
df.head()

Last updated 10 months ago

Was this helpful?