Toxicity
Objective: This guardrail uses pre-trained multi-label models to check whether the generated text is toxic. If the model predicts any label of: toxicity, severe_toxicity, obscene, threat, insult, identity_attack, or sexual_explicit with confidence higher than the specified threshold, the guardrail fails.
Required Parameters: Response
Interpretation: Higher score indicates the model response was toxic.
Metric Execution via UI:
Code Execution:
Model and Provider parameters are not mandatory to run Guardrails.
The "schema_mapping" variable needs to be defined first and is a pre-requisite for evaluation runs. Learn how to set this variable here.
Last updated