Executing tests

Once you have entered and saveed API keys on Settings/API keys, you can now execute tests. Use the following code in the Python client:

from RagaAICatalyst import Experiment

experiment_manager = Experiment(
    project_name="project_name",
    experiment_name="experiment_name",
    dataset_name="dataset_name"
)

response = experiment_manager.add_metrics(
    metrics=[
        {"name": "hallucination", "config": {"model": "gpt-4", "reason": True, "batch_size": 30, "provider": "OpenAI"}},
        {"name": "prompt_readability", "config": {"model": "gpt-4", "reason": True, "batch_size": 30, "provider": "OpenAI"}},
        {"name": "correctness", "config": {"model": "gpt-3.5-turbo", "reason": True, "batch_size": 30, "provider": "OpenAI"}}
    ]
)

print("Metric Response:", response)

You can add as many metrics as needed based on your use case or application. RagaAI Catalyst offers various metrics specific to prompts, responses, context, evaluation, and guardrails. [Refer to the Metric Glossary for more details].

Note: Enter the API key of your provider by navigating to Settings/API Keys. Refer Running RagaAI Evals to learn more

Configuration Metric Parameters

  • name (string): The name of the metric. This identifies the specific evaluation criterion (e.g., "Hallucination", "Faithfulness).

  • config (dictionary): A dictionary containing the configuration options for the selected metric.

    • Required Parameters:

      • provider (string): Identifies the service provider of the model being used (e.g., "OpenAI"). Following provider options are currently available on the platform: OpenAI, Azure, Gemeni and Groq

      • model (string): Specifies the model variant to be used for running the evals. Model name format varies on the basis of provider:

        • OpenAI: Model name. (e.g., "model": gpt-4o-mini")

        • Azure: azure/deployment name (e.g., "model": azure/azure-deployment")

        • Gemini: gemini/model name. (e.g., "model": gemini/gemini-1.5-flash)

        • Groq: groq/model name (e.g., "model": groq/llama-3-70b-819")

Model and Provider are not required to run Guardrails

  • Optional Parameters:

    • reason (boolean, optional): Indicates whether the metric should return an explanation for its evaluation. When set to True, the metric will provide detailed reasons or explanations for its findings.

    • batch_size (integer, optional): Defines the number of outputs or responses that will be evaluated together in each batch. This allows you to control how many items are processed at a time, balancing evaluation speed and resource usage.

Once the test is executed, you can view it inside the project where the experiment was created. Click on "View Experiment" to explore the insights of the experiment run.

Last updated