Invisible Text

Invisible Text Guardrail detects invisible characters in the output of the language model.

The Guardrail targets invisible Unicode characters, particularly in the Private Use Areas (PUA) of Unicode, which include:

  • Basic Multilingual Plane: U+E000 to U+F8FF

  • Supplementary Private Use Area-A: U+F0000 to U+FFFFD

  • Supplementary Private Use Area-B: U+100000 to U+10FFFD

It detects and removes characters in categories 'Cf' (Format characters), 'Cc' (Control characters), 'Co' (Private use characters), and 'Cn' (Unassigned characters), which are typically non-printable.

Parameters:

data:

  • prompt (str): The prompt to scan for invisible characters.

Interpretation:

Any invisible characters should not be present in the sanitized_prompt. Passed if no invisible characters are found, Failed if invisible characters are found and sanitized. Result shows the invisible characters present in the prompt.

Example:

Passed Scenario -

evaluator.add_test(
    test_names=["invisible_text_guardrail"],
    data={
        "prompt": ["Testing visible text in a prompt."],
    },
).run()

Result = ["Testing visible text in a prompt."]

Failed Scenario -

evaluator.add_test(
    test_names=["invisible_text_guardrail"],
    data={
        "prompt": ["This is a test string with a ︇︇󠁇z︇ero width space","This is a test string without 󠁀invisible text and a ︇︇󠁇z︇ero."],
    },
).run()

Result = ["This is a test string with a TAG LATIN CAPITAL LETTER G zero width space", "This is a test string without TAG COMMERCIAL AT invisible text and a TAG LATIN CAPITAL LETTER G zero."]

Code Example:

evaluator.add_test(
    test_names=["invisible_text_guardrail"],
    data={
        "prompt": "This is a test string with a \u200b zero width space",
    },
).run()
evaluator.print_results()

Last updated