Ban Topics

Ban Topics Guardrail is used to detect certain banned topics from the prompt. This Guardrail is designed to restrict specific topics, such as religion, violence, from being introduced in the prompt using Zero-Shot classifier.

It relies on the capabilities of the following models to perform zero-shot classification:

Parameters:

data:

  • prompt (str): Prompt to check for banned topics.

arguments:

  • topics (Sequence[str]): List of topics to ban.

  • threshold (float, optional): Threshold to determine if a topic is present in the prompt. Default is 0.6.

  • use_onnx (bool, optional): Whether to use ONNX for inference. Default is False.

Interpretation:

Score above the threshold means the topic is present in the prompt, result is Failed. If no banned topics above the threshold, result is Passed.

Example:

Passed Scenario -

evaluator.add_test(
    test_names=["ban_topics_guardrail"],
    data={
        "prompt": """Craft a poem that captures the breathtaking beauty and serenity of this underwater world.
        Describe the gentle swaying of the coral, the playful dance of the fish, and the mesmerising underwater soundscape.
        Use vivid imagery and metaphors to evoke a sense of awe and wonder for this natural treasure.""",
    },
    arguments={"topics": ["airplanes"], "threshold": 0.5, "model": "MoritzLaurer/deberta-v3-large-zeroshot-v1.1-all-33"},
).run()
evaluator.print_results()

Result = No banned topics found in the prompt.

Failed Scenario -

evaluator.add_test(
    test_names=["ban_topics_guardrail"],
    data={
        "prompt""""Craft a poem that captures the breathtaking beauty and serenity of this underwater world.
        Describe the gentle swaying of the coral, the playful dance of the fish, and the mesmerising underwater soundscape.
        Use vivid imagery and metaphors to evoke a sense of awe and wonder for this natural treasure.""",
    },
    arguments={"topics": ["oceans"], "threshold"0.5"model""MoritzLaurer/deberta-v3-large-zeroshot-v1.1-all-33"},
).run()
evaluator.print_results()

Result = Banned topics found in the prompt.

Last updated