LogoLogo
Slack CommunityCatalyst Login
  • Welcome
  • RagaAI Catalyst
    • User Quickstart
    • Concepts
      • Configure Your API Keys
      • Supported LLMs
        • OpenAI
        • Gemini
        • Azure
        • AWS Bedrock
        • ANTHROPIC
      • Catalyst Access/Secret Keys
      • Enable Custom Gateway
      • Uploading Data
        • Create new project
        • RAG Datset
        • Chat Dataset
          • Prompt Format
        • Logging traces (LlamaIndex, Langchain)
        • Trace Masking Functions
        • Trace Level Metadata
        • Correlating Traces with External IDs
        • Add Dataset
      • Running RagaAI Evals
        • Executing Evaluations
        • Compare Datasets
      • Analysis
      • Embeddings
    • RagaAI Metric Library
      • RAG Metrics
        • Hallucination
        • Faithfulness
        • Response Correctness
        • Response Completeness
        • False Refusal
        • Context Relevancy
        • Context Precision
        • Context Recall
        • PII Detection
        • Toxicity
      • Chat Metrics
        • Agent Quality
        • Instruction Adherence
        • User Chat Quality
      • Text-to-SQL
        • SQL Response Correctness
        • SQL Prompt Ambiguity
        • SQL Context Ambiguity
        • SQL Context Sufficiency
        • SQL Prompt Injection
      • Text Summarization
        • Summary Consistency
        • Summary Relevance
        • Summary Fluency
        • Summary Coherence
        • SummaC
        • QAG Score
        • ROUGE
        • BLEU
        • METEOR
        • BERTScore
      • Information Extraction
        • MINEA
        • Subjective Question Correction
        • Precision@K
        • Chunk Relevance
        • Entity Co-occurrence
        • Fact Entropy
      • Code Generation
        • Functional Correctness
        • ChrF
        • Ruby
        • CodeBLEU
        • Robust Pass@k
        • Robust Drop@k
        • Pass-Ratio@n
      • Marketing Content Evaluation
        • Engagement Score
        • Misattribution
        • Readability
        • Topic Coverage
        • Fabrication
      • Learning Management System
        • Topic Coverage
        • Topic Redundancy
        • Question Redundancy
        • Answer Correctness
        • Source Citability
        • Difficulty Level
      • Additional Metrics
        • Guardrails
          • Anonymize
          • Deanonymize
          • Ban Competitors
          • Ban Substrings
          • Ban Topics
          • Code
          • Invisible Text
          • Language
          • Secret
          • Sentiment
          • Factual Consistency
          • Language Same
          • No Refusal
          • Reading Time
          • Sensitive
          • URL Reachability
          • JSON Verify
        • Vulnerability Scanner
          • Bullying
          • Deadnaming
          • SexualContent
          • Sexualisation
          • SlurUsage
          • Profanity
          • QuackMedicine
          • DAN 11
          • DAN 10
          • DAN 9
          • DAN 8
          • DAN 7
          • DAN 6_2
          • DAN 6_0
          • DUDE
          • STAN
          • DAN_JailBreak
          • AntiDAN
          • ChatGPT_Developer_Mode_v2
          • ChatGPT_Developer_Mode_RANTI
          • ChatGPT_Image_Markdown
          • Ablation_Dan_11_0
          • Anthropomorphisation
      • Guardrails
        • Competitor Check
        • Gibberish Check
        • PII
        • Regex Check
        • Response Evaluator
        • Toxicity
        • Unusual Prompt
        • Ban List
        • Detect Drug
        • Detect Redundancy
        • Detect Secrets
        • Financial Tone Check
        • Has Url
        • HTML Sanitisation
        • Live URL
        • Logic Check
        • Politeness Check
        • Profanity Check
        • Quote Price
        • Restrict Topics
        • SQL Predicates Guard
        • Valid CSV
        • Valid JSON
        • Valid Python
        • Valid Range
        • Valid SQL
        • Valid URL
        • Cosine Similarity
        • Honesty Detection
        • Toxicity Hate Speech
    • Prompt Playground
      • Concepts
      • Single-Prompt Playground
      • Multiple Prompt Playground
      • Run Evaluations
      • Using Prompt Slugs with Python SDK
      • Create with AI using Prompt Wizard
      • Prompt Diff View
    • Synthetic Data Generation
    • Gateway
      • Quickstart
    • Guardrails
      • Quickstart
      • Python SDK
    • RagaAI Whitepapers
      • RagaAI RLEF (RAG LLM Evaluation Framework)
    • Agentic Testing
      • Quickstart
      • Concepts
        • Tracing
          • Langgraph (Agentic Tracing)
          • RagaAI Catalyst Tracing Guide for Azure OpenAI Users
        • Dynamic Tracing
        • Application Workflow
      • Create New Dataset
      • Metrics
        • Hallucination
        • Toxicity
        • Honesty
        • Cosine Similarity
      • Compare Traces
      • Compare Experiments
      • Add metrics locally
    • Custom Metric
    • Auto Prompt Optimization
    • Human Feedback & Annotations
      • Thumbs Up/Down
      • Add Metric Corrections
      • Corrections as Few-Shot Examples
      • Tagging
    • On-Premise Deployment
      • Enterprise Deployment Guide for AWS
      • Enterprise Deployment Guide for Azure
      • Evaluation Deployment Guide
        • Evaluation Maintenance Guide
    • Fine Tuning (OpenAI)
    • Integration
    • SDK Release Notes
      • ragaai-catalyst 2.1.7
  • RagaAI Prism
    • Quickstart
    • Sandbox Guide
      • Object Detection
      • LLM Summarization
      • Semantic Segmentation
      • Tabular Data
      • Super Resolution
      • OCR
      • Image Classification
      • Event Detection
    • Test Inventory
      • Object Detection
        • Failure Mode Analysis
        • Model Comparison Test
        • Drift Detection
        • Outlier Detection
        • Data Leakage Test
        • Labelling Quality Test
        • Scenario Imbalance
        • Class Imbalance
        • Active Learning
        • Image Property Drift Detection
      • Large Language Model (LLM)
        • Failure Mode Analysis
      • Semantic Segmentation
        • Failure Mode Analysis
        • Labelling Quality Test
        • Active Learning
        • Drift Detection
        • Class Imbalance
        • Scenario Imbalance
        • Data Leakage Test
        • Outlier Detection
        • Label Drift
        • Semantic Similarity
        • Near Duplicates Detection
        • Cluster Imbalance Test
        • Image Property Drift Detection
        • Spatio-Temporal Drift Detection
        • Spatio-Temporal Failure Mode Analysis
      • Tabular Data
        • Failure Mode Analysis
      • Instance Segmentation
        • Failure Mode Analysis
        • Labelling Quality Test
        • Drift Detection
        • Class Imbalance
        • Scenario Imbalance
        • Label Drift
        • Data Leakage Test
        • Outlier Detection
        • Active Learning
        • Near Duplicates Detection
      • Super Resolution
        • Semantic Similarity
        • Active Learning
        • Near Duplicates Detection
        • Outlier Detection
      • OCR
        • Missing Value Test
        • Outlier Detection
      • Image Classification
        • Failure Mode Analysis
        • Labelling Quality Test
        • Class Imbalance
        • Drift Detection
        • Near Duplicates Test
        • Data Leakage Test
        • Outlier Detection
        • Active Learning
        • Image Property Drift Detection
      • Event Detection
        • Failure Mode Analysis
        • A/B Test
    • Metric Glossary
    • Upload custom model
    • Event Detection
      • Upload Model
      • Generate Inference
      • Run tests
    • On-Premise Deployment
      • Enterprise Deployment Guide for AWS
      • Enterprise Deployment Guide for Azure
  • Support
Powered by GitBook
On this page
  • How to add Ground Truth?
  • RagaAI Tracing Supported Attributes :

Was this helpful?

  1. RagaAI Catalyst
  2. Agentic Testing
  3. Concepts

Tracing

PreviousConceptsNextLanggraph (Agentic Tracing)

Last updated 2 months ago

Was this helpful?

Tracing is a core capability of RagaAI Catalyst designed to help you debug, evaluate, and analyse the behavior of agentic applications by capturing detailed experiment-level traces. These traces document the inputs, outputs, decisions, and intermediate steps taken by your application, enabling a deep understanding of its performance and identifying areas for improvement.

Note: this works on Python :3.10,3.11,3.12


What is Tracing for Agentic Applications?

In agentic applications, tracing refers to the process of monitoring and recording the sequence of operations executed by the system. This includes:

  • Interactions between agents, tools, and external systems.

  • Decision-making processes of agents across variable inputs.

  • Key performance metrics and error patterns.

Tracing provides insights into complex workflows, enabling you to:

  • Debug issues effectively.

  • Validate application logic.

  • Optimize the overall performance.


How to Capture Traces Using RagaAI Catalyst

Follow the steps below to start tracing your agentic application using RagaAI Catalyst.


Quickstart Guide

Use this to get started quickly.


Step-by-Step Instructions

  1. Install RagaAI Catalyst

    !pip install ragaai-catalyst
  2. Import Required Modules

    from ragaai_catalyst import RagaAICatalyst
    from ragaai_catalyst import Tracer
  3. Initialize RagaAI Catalyst

    catalyst = RagaAICatalyst(
        access_key="access_key",
        secret_key="secret_key",
    )
  4. Create a Project and Dataset Define your project and dataset for structured trace management.

    project_name = "Project_Name" # create a project on the UI
    tracer_dataset_name = "dataset_name"
  5. Create a Tracer Object

    tracer = Tracer(
        project_name=project_name,
        dataset_name=tracer_dataset_name,
        tracer_type="agentic_tracing",
    )
    init_tracing(catalyst=catalyst, tracer=tracer)
  6. Wrap Functions with RagaAI Catalyst Decorators Use decorators to trace specific functionalities in your application.

    from ragaai_catalyst import trace_llm, trace_tool, trace_agent
    @trace_llm(name="agent_name")
    async def my_llm_function():
     # Your LLM call here
     pass
    
    @trace_tool("my_tool")
    def my_tool_function():
     # Your tool logic here
    pass
    
    @trace_agent("my_agent")
    def my_agent_function():
     # Your agent logic here
     pass

Additional Resources

Note: when you change anything inside your code to run the trace, it will create a new code version

Note: this works on Python :3.10,3.11,3.12

How to add Ground Truth?

To include ground truth data in a trace, follow these steps:

  1. Define an input parameter: In the traced element, create an input with the parameter name gt and assign it a default value.

  2. Pass the ground truth: When invoking the function during the agent run, include the ground truth data as the argument for the gt parameter.

By doing this, the ground truth data will automatically be recorded in the trace.

from ragaai_catalyst import trace_llm, trace_tool, trace_agent
@trace_llm(name="agent_name")
async def my_llm_function():
 # Your LLM call here
 pass

@trace_tool("my_tool")
def my_tool_function():
 # Your tool logic here
pass

@trace_agent("my_agent")
def my_agent_function():
 # Your agent logic here
 pass


with tracer:
    my_llm("Hello, world!", gt='some gt value')
    my_tool_function(gt='some gt value')
    my_agent_function(gt='some gt value')

RagaAI Tracing Supported Attributes :

  1. User Input

  2. Agent Output

  3. Network Calls

  4. File Read/ Write

  5. Tool Calls

  6. LLM Calls

  7. Custom Calls

Defining Trace and Span

Understanding the core concepts of trace and span is essential for effectively using RagaAI Catalyst’s tracing features.


What is a Trace?

A trace represents the complete lifecycle of an operation or experiment in your agentic application. It includes:

  • The series of actions and decisions made by agents.

  • Interactions between components like LLMs, tools, and agents.

  • Metadata capturing contextual information such as timestamps, inputs, outputs, and errors.

Think of a trace as the "big picture" view of how your application executes an operation from start to finish.

Example

If you are testing an application to handle a financial query, the trace might include:

  1. The user's input.

  2. The decision-making process of the financial agent.

  3. API calls to external services.

  4. The generated response and any intermediate computations.


What is a Span?

A span is a single unit of work within a trace. It represents a specific operation, such as:

  • A call to an external API.

  • A task performed by a tool or an agent.

  • An execution of an LLM prompt.

Each span contains:

  • Operation Name: The task being performed (e.g., "API Call", "LLM Response").

  • Start and End Time: Duration of the operation.

  • Attributes: Additional metadata specific to the operation, such as parameters or error codes.

Spans are the building blocks of a trace, providing granular details about each step within the larger operation.


Relation Between Trace and Span

  • A trace is composed of multiple spans.

  • Each span contributes to the trace by capturing detailed information about a specific task or operation.


For detailed explanations of each decorator and their use cases, refer to .

Sample Colab Notebook
this guide