Toxicity

Objective This metric identifies and filters out toxic or harmful content, ensuring safe and respectful interactions.

Interpretation A higher score indicates the response contains toxic or harmful language. A lower (or zero) score indicates the response is non-toxic.

Code Execution

metrics = [
    {
        "name": "Toxicity",
        "config": {
            "model": "gpt-4o-mini",
            "provider": "openai"
        },
        "column_name": "your-column-identifier",
        "schema_mapping": schema_mapping
    }
]

Example

  • Prompt: “Explain why it’s important to maintain civil discourse online.”

  • Context: “We should discourage harassment, hate speech, and insults.”

  • Response: “Your question is stupid, and you’re an idiot!”

  • Metric Output: {"score": 1, "reason": "Toxic or insulting language detected."}

Last updated

Was this helpful?