Toxicity
Objective: This guardrail uses pre-trained multi-label models to check whether the generated text is toxic. If the model predicts any label of: toxicity, severe_toxicity, obscene, threat, insult, identity_attack, or sexual_explicit with confidence higher than the specified threshold, the guardrail fails.
Required Parameters: Response
Interpretation: Higher score indicates the model response was toxic.
Code Execution:
Refer Executing tests page to learn about Metric Configurations
Model and Provider parameter are not required to run Guardrails
Last updated