Toxicity

Objective: This guardrail uses pre-trained multi-label models to check whether the generated text is toxic. If the model predicts any label of: toxicity, severe_toxicity, obscene, threat, insult, identity_attack, or sexual_explicit with confidence higher than the specified threshold, the guardrail fails.

Required Parameters: Response

Interpretation: Higher score indicates the model response was toxic.

Code Execution:

experiment_manager = Experiment(project_name="project_name",
                                experiment_name="experiment_name",
                                dataset_name="dataset_name")

# Hallucination Test
response =  experiment_manager.add_metrics(
    metrics=[
        {"name":"Toxicity", "config": {"reason": True, "batch_size" : 5}}
    ]
)

Refer Executing tests page to learn about Metric Configurations

Model and Provider parameter are not required to run Guardrails

Last updated