Ban Topics

Ban Topics Guardrail is used to detect certain banned topics from the prompt. This Guardrail is designed to restrict specific topics, such as religion, violence, from being introduced in the prompt using Zero-Shot classifier.

It relies on the capabilities of the following models to perform zero-shot classification:

Model

Description

MoritzLaurer/deberta-v3-large-zeroshot-v1.1-all-33

Trained on a mixture of 33 datasets and 389 classes reformatted in the universal NLI format. The model is English-only. It can also be used for multilingual zero-shot classification by first machine translating texts to English.

MoritzLaurer/deberta-v3-base-zeroshot-v1.1-all-33

Essentially the same as its larger sister but smaller. Use it if you need more speed. The model is English-only.

MoritzLaurer/deberta-v3-xsmall-zeroshot-v1.1-all-33

Same as above, just smaller and faster.

MoritzLaurer/xtremedistil-l6-h256-zeroshot-v1.1-all-33

Same as above, but even faster. The model only has 22 million backbone parameters and is 25 MB small (or 13 MB with ONNX quantization).

Parameters:

data:

prompt (str): Prompt to check for banned topics.

arguments:

topics (Sequence[str]): List of topics to ban.
threshold (float, optional): Threshold to determine if a topic is present in the prompt. Default is 0.6.
use_onnx (bool, optional): Whether to use ONNX for inference. Default is False.

Interpretation:

Score above the threshold means the topic is present in the prompt, result is Failed. If no banned topics above the threshold, result is Passed.

Example:

Passed Scenario -

evaluator.add_test(
    test_names=["ban_topics_guardrail"],
    data={
        "prompt": """Craft a poem that captures the breathtaking beauty and serenity of this underwater world.
        Describe the gentle swaying of the coral, the playful dance of the fish, and the mesmerising underwater soundscape.
        Use vivid imagery and metaphors to evoke a sense of awe and wonder for this natural treasure.""",
    },
    arguments={"topics": ["airplanes"], "threshold": 0.5, "model": "MoritzLaurer/deberta-v3-large-zeroshot-v1.1-all-33"},
).run()
evaluator.print_results()

Result = No banned topics found in the prompt.

Failed Scenario -

evaluator.add_test(
    test_names=["ban_topics_guardrail"],
    data={
        "prompt": """Craft a poem that captures the breathtaking beauty and serenity of this underwater world.
        Describe the gentle swaying of the coral, the playful dance of the fish, and the mesmerising underwater soundscape.
        Use vivid imagery and metaphors to evoke a sense of awe and wonder for this natural treasure.""",
    },
    arguments={"topics": ["oceans"], "threshold": 0.5, "model": "MoritzLaurer/deberta-v3-large-zeroshot-v1.1-all-33"},
).run()
evaluator.print_results()

Result = Banned topics found in the prompt.

PreviousBan Substrings NextCode

Last updated 1 year ago

Was this helpful?