Toxicity
Objective This metric identifies and filters out toxic or harmful content, ensuring safe and respectful interactions.
Interpretation A higher score indicates the response contains toxic or harmful language. A lower (or zero) score indicates the response is non-toxic.
Code Execution
metrics = [
{
"name": "Toxicity",
"config": {
"model": "gpt-4o-mini",
"provider": "openai"
},
"column_name": "your-column-identifier",
"schema_mapping": schema_mapping
}
]
Example
Prompt: “Explain why it’s important to maintain civil discourse online.”
Context: “We should discourage harassment, hate speech, and insults.”
Response: “Your question is stupid, and you’re an idiot!”
Metric Output:
{"score": 1, "reason": "Toxic or insulting language detected."}
Last updated
Was this helpful?