Metrics
Explore metrics that evaluate AI agents. Track hallucinations, honesty, similarity, and more in workflows.
Agentic Testing in RagaAI Catalyst is built for evaluating advanced, autonomous AI systems. Agentic applications are capable of planning, reasoning, and executing multi-step tasks that require specialized metrics that can handle dynamic behavior, trace-level evaluation, and safety validation.
Why These Metrics Matter
Deep behavior insights: Uncover how agents make decisions through trace-level metric data.
Safety & robustness: Detect hallucinations, toxicity, misalignment, and other failure modes early.
Comparative analysis: Benchmark agent performance across runs to optimize planning logic and tool integration.
Customizability: Supplement default metrics with custom ones tailored to your domain or workflow.
Core Default Metrics
HallucinationToxicityHonestyCosine SimilarityLast updated
Was this helpful?