Ban List

Objective This metric filters out responses containing banned words or phrases, ensuring compliance with content guidelines.

Interpretation A higher score indicates the response contains banned terms. A lower (or zero) score indicates no banned terms were found.

Code Execution

metrics = [
    {
        "name": "Ban List",
        "config": {
            "model": "gpt-4o-mini",
            "provider": "openai"
        },
        "column_name": "your-column-identifier",
        "schema_mapping": schema_mapping
    }
]

Example

Prompt: “Tell me a story.”
Context: “Avoid the use of these banned words: [redacted list].”
Response: “Once upon a time, [banned word]…”
Metric Output: {"score": 1, "reason": "Response contains a banned word or phrase."}

PreviousUnusual Prompt NextDetect Drug

Last updated 6 months ago

Was this helpful?