Analysing experiments

Inside an experiment, you can view the artefacts and metrics on which the experiment is run.

  • Results Section: This section provides a summary of all the metrics and configurations used in the experiment, allowing you to quickly glance at the metric scores.

  • Metric Over Attribute: The first graph allows you to select both a metric and an attribute for deeper analysis.

    Interpretation of the Graph:

    • X-axis: Attribute

    • Y-axis: Metric Value

  • Metric Distribution: The second graph shows the distribution of the selected metric over time.

    Interpretation of the Graph:

    • X-axis: Metric Value

    • Y-axis: Frequency (Number of Datapoints)

  • Datapoints Section: This section provides a table view of the dataset on which the test is executed. You can configure the columns, check metric values for each input column, view metric configurations, and see all metadata included in the dataset. You can also provide human feedback on the datapoints.

  • Detailed Datapoint View: By clicking on a prompt row, you can open the datapoint in detail. This view shows all metrics executed in detail, including the reasoning behind the metric scores. You can see the prompt, response, sanitised response (if guardrail metrics are run), and attributes of the datapoint.

  • Trace View: If the datapoint has traces logged, the user can click on the trace view to view the traces. LLM applications use increasingly complex abstractions, such as chains, agents with tools, and advanced prompts. The nested traces in RagaAI Catalyst help to understand what is happening and identify the root cause of problems.

Why Use Tracing for an LLM Application?

  • Capture the full context of the execution, including API calls, context, prompts, parallelism, and more.

  • Track model usage and cost.

  • Identify low-quality outputs.

  • Build fine-tuning and testing datasets. A trace in RagaAI Catalyst consists of the following objects:

    • Trace: Typically represents a single request or operation. It contains the overall input and output of the function, as well as metadata about the request, such as the user, the session, and tags.

    • Observations: Log the individual steps of the execution and come in different types:

      • Events: The basic building blocks used to track discrete events in a trace.

      • Spans: Represent durations of units of work in a trace.

      • Generations: Spans used to log generations of AI models, containing additional attributes about the model, the prompt, and the completion. Token usage and costs are automatically calculated for generations.

    Observations can be nested, allowing for detailed tracking of complex execution flows.

Last updated