Executing Evaluations
Run various cutting edge evaluation metrics out-of-the-box with a few simple steps
Last updated
Run various cutting edge evaluation metrics out-of-the-box with a few simple steps
Last updated
Navigate to your dataset and click the "Evaluate" button to begin configuring your evaluation.
2. Selecting a Metric
Choose the metric you want to run on the dataset from the available options.
Enter a unique metric name to identify this evaluation. This will help you track the column name .
Choose the model you want to use for running the evaluation. You can select from pre-configured models within the platform or use a custom gateway (described here) to perform the evaluations.
In case you have configured your own gateway, you should see a "custom_gateway" option in the model selection dropdown, which can be selected and used.
Map the selected metric to the appropriate column names in your dataset.
Ensure that each metric is correctly aligned with the corresponding data columns to ensure accurate evaluation.
User can configure the passing criteria for each metric to define the passed and failed datapoints. Users can re-configure the threshold once they have been calculated from the UI using the ⚙️ icon beside the metric column name.
Click on "Update Threshold" to update.
Optionally, you can apply filters to narrow down the data points for the evaluation.
This is useful if you want to evaluate a specific subset of your dataset.
Once the metric, model, and filters are configured, click "Save" to save your evaluation setup.
Repeat the steps above to configure multiple evaluations if needed. This allows you to run several evaluations on the dataset simultaneously.
Once all evaluations are set up, click "Evaluate" to execute the evaluations for all the configured metrics.
You can also run metrics using the following commands:
The schema mapping above follows a format similar to "key":"value" representations, with "key" representing the column names in your dataset, and "value" representing a Catalyst Schema definition variable (pre-defined). Here is a list of all supported schema variables (case-sensitive):
prompt
context
response
expected_response
expected_context
traceId
timestamp
metadata
pipeline
cost
feedBack
latency
system_prompt
traceUri
In case you have enabled a custom gateway, the above metric evaluation configuration will be edited for your model's details as follows:
Once evaluations have been triggered, they can be tracked and accessed as follows: