Executing Evaluations
Run various cutting edge evaluation metrics out-of-the-box with a few simple steps
Via UI:
1. Adding an Evaluation
Navigate to your dataset and click the "Evaluate" button to begin configuring your evaluation.
2. Selecting a Metric
Choose the metric you want to run on the dataset from the available options.
3. Naming the Metric
Enter a unique metric name to identify this evaluation. This will help you track the column name .
4. Configure the parameters
Choose the model you want to use for running the evaluation. You can select from pre-configured models within the platform or use a custom gateway (described here) to perform the evaluations.
In case you have configured your own gateway, you should see a "custom_gateway" option in the model selection dropdown, which can be selected and used.
Map the selected metric to the appropriate column names in your dataset.
Ensure that each metric is correctly aligned with the corresponding data columns to ensure accurate evaluation.
5. Threshold
User can configure the passing criteria for each metric to define the passed and failed datapoints. Users can re-configure the threshold once they have been calculated from the UI using the ⚙️ icon beside the metric column name.
Click on "Update Threshold" to update.
6. Applying Filters (Optional)
Optionally, you can apply filters to narrow down the data points for the evaluation.
This is useful if you want to evaluate a specific subset of your dataset.
7. Saving the Configuration
Once the metric, model, and filters are configured, click "Save" to save your evaluation setup.
8. Configuring Multiple Evaluations
Repeat the steps above to configure multiple evaluations if needed. This allows you to run several evaluations on the dataset simultaneously.
9. Running the Evaluations
Once all evaluations are set up, click "Evaluate" to execute the evaluations for all the configured metrics.
Via SDK:
You can also run metrics using the following commands:
The schema mapping above follows a format similar to "key":"value" representations, with "key" representing the column names in your dataset, and "value" representing a Catalyst Schema definition variable (pre-defined). Here is a list of all supported schema variables (case-sensitive):
prompt
context
response
expected_response
expected_context
traceId
timestamp
metadata
pipeline
cost
feedBack
latency
system_prompt
traceUri
In case you have enabled a custom gateway, the above metric evaluation configuration will be edited for your model's details as follows:
Once evaluations have been triggered, they can be tracked and accessed as follows:
Last updated