LogoLogo
Slack CommunityCatalyst Login
  • Welcome
  • RagaAI Catalyst
    • User Quickstart
    • Concepts
      • Configure Your API Keys
      • Supported LLMs
        • OpenAI
        • Gemini
        • Azure
        • AWS Bedrock
        • ANTHROPIC
      • Catalyst Access/Secret Keys
      • Enable Custom Gateway
      • Uploading Data
        • Create new project
        • RAG Datset
        • Chat Dataset
          • Prompt Format
        • Logging traces (LlamaIndex, Langchain)
        • Trace Masking Functions
        • Trace Level Metadata
        • Correlating Traces with External IDs
        • Add Dataset
      • Running RagaAI Evals
        • Executing Evaluations
        • Compare Datasets
      • Analysis
      • Embeddings
    • RagaAI Metric Library
      • RAG Metrics
        • Hallucination
        • Faithfulness
        • Response Correctness
        • Response Completeness
        • False Refusal
        • Context Relevancy
        • Context Precision
        • Context Recall
        • PII Detection
        • Toxicity
      • Chat Metrics
        • Agent Quality
        • Instruction Adherence
        • User Chat Quality
      • Text-to-SQL
        • SQL Response Correctness
        • SQL Prompt Ambiguity
        • SQL Context Ambiguity
        • SQL Context Sufficiency
        • SQL Prompt Injection
      • Text Summarization
        • Summary Consistency
        • Summary Relevance
        • Summary Fluency
        • Summary Coherence
        • SummaC
        • QAG Score
        • ROUGE
        • BLEU
        • METEOR
        • BERTScore
      • Information Extraction
        • MINEA
        • Subjective Question Correction
        • Precision@K
        • Chunk Relevance
        • Entity Co-occurrence
        • Fact Entropy
      • Code Generation
        • Functional Correctness
        • ChrF
        • Ruby
        • CodeBLEU
        • Robust Pass@k
        • Robust Drop@k
        • Pass-Ratio@n
      • Marketing Content Evaluation
        • Engagement Score
        • Misattribution
        • Readability
        • Topic Coverage
        • Fabrication
      • Learning Management System
        • Topic Coverage
        • Topic Redundancy
        • Question Redundancy
        • Answer Correctness
        • Source Citability
        • Difficulty Level
      • Additional Metrics
        • Guardrails
          • Anonymize
          • Deanonymize
          • Ban Competitors
          • Ban Substrings
          • Ban Topics
          • Code
          • Invisible Text
          • Language
          • Secret
          • Sentiment
          • Factual Consistency
          • Language Same
          • No Refusal
          • Reading Time
          • Sensitive
          • URL Reachability
          • JSON Verify
        • Vulnerability Scanner
          • Bullying
          • Deadnaming
          • SexualContent
          • Sexualisation
          • SlurUsage
          • Profanity
          • QuackMedicine
          • DAN 11
          • DAN 10
          • DAN 9
          • DAN 8
          • DAN 7
          • DAN 6_2
          • DAN 6_0
          • DUDE
          • STAN
          • DAN_JailBreak
          • AntiDAN
          • ChatGPT_Developer_Mode_v2
          • ChatGPT_Developer_Mode_RANTI
          • ChatGPT_Image_Markdown
          • Ablation_Dan_11_0
          • Anthropomorphisation
      • Guardrails
        • Competitor Check
        • Gibberish Check
        • PII
        • Regex Check
        • Response Evaluator
        • Toxicity
        • Unusual Prompt
        • Ban List
        • Detect Drug
        • Detect Redundancy
        • Detect Secrets
        • Financial Tone Check
        • Has Url
        • HTML Sanitisation
        • Live URL
        • Logic Check
        • Politeness Check
        • Profanity Check
        • Quote Price
        • Restrict Topics
        • SQL Predicates Guard
        • Valid CSV
        • Valid JSON
        • Valid Python
        • Valid Range
        • Valid SQL
        • Valid URL
        • Cosine Similarity
        • Honesty Detection
        • Toxicity Hate Speech
    • Prompt Playground
      • Concepts
      • Single-Prompt Playground
      • Multiple Prompt Playground
      • Run Evaluations
      • Using Prompt Slugs with Python SDK
      • Create with AI using Prompt Wizard
      • Prompt Diff View
    • Synthetic Data Generation
    • Gateway
      • Quickstart
    • Guardrails
      • Quickstart
      • Python SDK
    • RagaAI Whitepapers
      • RagaAI RLEF (RAG LLM Evaluation Framework)
    • Agentic Testing
      • Quickstart
      • Concepts
        • Tracing
          • Langgraph (Agentic Tracing)
          • RagaAI Catalyst Tracing Guide for Azure OpenAI Users
        • Dynamic Tracing
        • Application Workflow
      • Create New Dataset
      • Metrics
        • Hallucination
        • Toxicity
        • Honesty
        • Cosine Similarity
      • Compare Traces
      • Compare Experiments
      • Add metrics locally
    • Custom Metric
    • Auto Prompt Optimization
    • Human Feedback & Annotations
      • Thumbs Up/Down
      • Add Metric Corrections
      • Corrections as Few-Shot Examples
      • Tagging
    • On-Premise Deployment
      • Enterprise Deployment Guide for AWS
      • Enterprise Deployment Guide for Azure
      • Evaluation Deployment Guide
        • Evaluation Maintenance Guide
    • Fine Tuning (OpenAI)
    • Integration
    • SDK Release Notes
      • ragaai-catalyst 2.1.7
  • RagaAI Prism
    • Quickstart
    • Sandbox Guide
      • Object Detection
      • LLM Summarization
      • Semantic Segmentation
      • Tabular Data
      • Super Resolution
      • OCR
      • Image Classification
      • Event Detection
    • Test Inventory
      • Object Detection
        • Failure Mode Analysis
        • Model Comparison Test
        • Drift Detection
        • Outlier Detection
        • Data Leakage Test
        • Labelling Quality Test
        • Scenario Imbalance
        • Class Imbalance
        • Active Learning
        • Image Property Drift Detection
      • Large Language Model (LLM)
        • Failure Mode Analysis
      • Semantic Segmentation
        • Failure Mode Analysis
        • Labelling Quality Test
        • Active Learning
        • Drift Detection
        • Class Imbalance
        • Scenario Imbalance
        • Data Leakage Test
        • Outlier Detection
        • Label Drift
        • Semantic Similarity
        • Near Duplicates Detection
        • Cluster Imbalance Test
        • Image Property Drift Detection
        • Spatio-Temporal Drift Detection
        • Spatio-Temporal Failure Mode Analysis
      • Tabular Data
        • Failure Mode Analysis
      • Instance Segmentation
        • Failure Mode Analysis
        • Labelling Quality Test
        • Drift Detection
        • Class Imbalance
        • Scenario Imbalance
        • Label Drift
        • Data Leakage Test
        • Outlier Detection
        • Active Learning
        • Near Duplicates Detection
      • Super Resolution
        • Semantic Similarity
        • Active Learning
        • Near Duplicates Detection
        • Outlier Detection
      • OCR
        • Missing Value Test
        • Outlier Detection
      • Image Classification
        • Failure Mode Analysis
        • Labelling Quality Test
        • Class Imbalance
        • Drift Detection
        • Near Duplicates Test
        • Data Leakage Test
        • Outlier Detection
        • Active Learning
        • Image Property Drift Detection
      • Event Detection
        • Failure Mode Analysis
        • A/B Test
    • Metric Glossary
    • Upload custom model
    • Event Detection
      • Upload Model
      • Generate Inference
      • Run tests
    • On-Premise Deployment
      • Enterprise Deployment Guide for AWS
      • Enterprise Deployment Guide for Azure
  • Support
Powered by GitBook
On this page

Was this helpful?

  1. RagaAI Catalyst
  2. Agentic Testing

Concepts

Concepts in RagaAI Catalyst

This section introduces the key concepts and functionalities in RagaAI Catalyst, including tracing, decorators, dataset structure, metrics, and code versioning.


1. Tracing

Tracing is the process of capturing and analyzing interactions within an Agentic application. It records essential details such as input, output, and metadata during execution. Tracing is crucial for debugging, optimization, and evaluation.


2. Decorators

Decorators in RagaAI Catalyst enhance functions by adding tracing capabilities. They enable logging, instrumentation, or tracing without modifying the original code.

Types of Decorators and Their Usage

  1. trace_llm:

    • Purpose: Traces interactions with Large Language Models (LLMs).

    • Traces:

      • Input prompts.

      • Generated responses.

      • Metadata.

      • Exceptions during processing.

  2. trace_tool:

    • Purpose: Traces the usage of tools in the framework.

    • Traces:

      • Input data.

      • Tool-generated results.

      • Exceptions during processing.

  3. trace_agent:

    • Purpose: Traces actions of agents (e.g., virtual assistants).

    • Traces:

      • Actions performed by the agent.

      • Outcomes of actions.

      • State changes.

      • Exceptions during processing.

      • Note: Only agents can have children, which may include LLMs, tools, or other agents.


3. Trace Components

LLM

  • Captures LLM inputs, outputs, and token usage.

  • Enables evaluation on spans such as LLM, tool, agent, or the entire trace.

Tracing Metadata and Components

  • Metadata includes contextual details such as token usage, error status, model information, and execution context.

  • Each trace component can capture unique attributes to aid in detailed evaluations.


4. Dataset

Dataset Column Names

  • TraceID: Unique identifier for the trace.

  • Timestamp: Execution timestamp.

  • Trace URL: URL to access trace details.

  • Feedback: User-provided feedback on trace.

  • Response: Captured LLM response.

  • Metadata: Key contextual details logged.

  • Metric Score: Evaluation score.

  • Completion Token: Tokens generated by the model.

  • Prompt Token: Tokens used in the prompt.

  • Cost: Resource usage cost.


5. Metrics

Metric Evaluation Levels

  1. LLM: Evaluate spans with type llm. Runs on the last occurrence of the selected span.

  2. Tool: Evaluate spans with type tool. Runs on the last occurrence of the selected span.

  3. Agent: Evaluate spans with type conversation. Runs on the last occurrence of the selected span.

  4. Trace: Evaluate the entire trace.json.

  5. Conversation: Evaluate all input-output interactions in an Agentic application.


Metric Parameter Mappings

  1. LLM:

    • data.input: Input prompt or query.

    • data.output: LLM response.

    • data.function_call: Called function name.

    • data.function_parameters: Function parameters.

    • info.model: Model information.

    • info.token_usage: Token usage statistics.

    • data.gt: Ground truth.

  2. Tool:

    • data.input: Input to the tool.

    • data.output: Tool output.

    • data.error: Tool error details (if any).

    • networkcalls.status_code: HTTP status code.

    • networkcalls.url: Request URL.

    • networkcalls.method: HTTP method.

    • data.gt: Ground truth.

  3. Agent:

    • data.input: Input prompt or query.

    • data.output: Agent response.

    • data.function_call: Called function name.

    • data.function_parameters: Function parameters.

    • info.model: Model information.

    • info.token_usage: Token usage statistics.

    • data.gt: Ground truth.


6 . Custom Metrics

  • RagaAI Catalyst supports custom metrics for advanced evaluations.

  • Refer to the documentation for setup and configuration of custom metrics.


7. Code Versioning

  • RagaAI Catalyst automatically versions the code whenever changes are made to trace instrumentation.

  • Users can view the code diffs to track modifications and compare versions.

  • This feature ensures traceability and consistency in trace evaluations.

By understanding these concepts, users can effectively leverage RagaAI Catalyst's features to analyze, evaluate, and optimize their Agentic applications.

PreviousQuickstartNextTracing

Last updated 4 months ago

Was this helpful?