Run Evaluations
RagaAI Catalyst offers a powerful evaluation feature that allows users to assess the performance and quality of their prompts. This feature enables data-driven optimization of your LLM interactions.
Refer the complete directory of RagaAI Metrics.
Evaluation Process:
Initiate Evaluation:
Navigate to your prompt in the RagaAI Catalyst Playground.
Click on the "Evaluate" button to begin the evaluation process.
Select Evaluation Metrics: RagaAI Catalyst provides a comprehensive set of metrics across multiple categories:
Prompt: Assess the quality and effectiveness of your input prompt.
Response: Evaluate the generated output from the LLM.
Context: Analyze the relevance and quality of provided context.
Expected Response: Compare the generated response against predetermined criteria.
Expected Context: Evaluate the alignment of generated context with expectations.
Configure Metrics:
Choose relevant metrics from each category based on your evaluation needs.
For each selected metric:
a. Provide a descriptive name to easily identify the metric in your results. b. Perform schema mapping to align the metric with specific variables or response elements.
c. Select the threshold value to define the passing criteria.
Execute Evaluation:
After configuring your desired metrics, click "Run Evals" to initiate the evaluation process.
RagaAI Catalyst will apply the selected metrics to your prompt and response data.
Review Results:
Once the evaluation is complete, review the detailed results for each metric.
Analyze scores, identify strengths, and pinpoint areas for improvement in your prompt design.
Combined Prompt Execution and Evaluation
RagaAI Catalyst offers a streamlined approach to iterative prompt development:
Modify Variables:
Experiment with different variable values in your prompt template.
One-Click Execution and Evaluation:
Instead of running prompts and evaluations separately, use the "Run Prompts + Evals" button.
This action will:
a. Generate a new response using the current prompt and variables. b. Automatically run all selected evaluations on the fresh output.
Last updated