Compare experiments
The Compare feature allows you to juxtapose results from different experiments. This helps you identify patterns, strengths, and weaknesses across multiple models and datasets.
To compare experiments, follow these steps:
Inside the Experiments tab, click on the "Select to compare" button to select multiple experiments for comparison.
Click on the "Compare" button to open the comparison view.
The comparison view is similar to the individual experiment view but displays side-by-side columns for each selected experiment. This layout allows for easy comparison of metrics, configurations, and results.
Results Section: In the comparison view, you will see a summary of all the metrics and configurations used in each experiment. This helps in quickly identifying differences and similarities across experiments.
Metric Over Attribute: This graph allows you to compare how different metrics perform over a selected attribute across experiments. You can select the metric and attribute to analyze, and view the results side by side.
Interpretation of the Graph:
X-axis: Attribute
Y-axis: Metric Value for each experiment
Metric Distribution: This graph shows the distribution of the selected metric over time for each experiment.
Interpretation of the Graph:
X-axis: Metric Value
Y-axis: Frequency (Number of Datapoints) for each experiment
Datapoints Section: This section provides a comparative table view of the dataset on which the tests were executed. You can configure the columns, check metric values for each input column, and see all metadata included in the datasets for each experiment.
Detailed Datapoint View: By clicking on a prompt row, you can open the datapoint in detail for each experiment. This view shows all metrics executed in detail for each experiment, including the reasoning behind the metric scores. You can see the prompt, response, sanitized response (if guardrail metrics are run), and attributes of the datapoint for each experiment.
By using the comparison feature, you can effectively perform A/B testing, helping you make informed decisions on model performance and suitability for your specific use cases.
Last updated