Anonymize

Use anonymization guardrails to hide sensitive data. Protect privacy while maintaining analysis accuracy.

Anonymize Personally Identifiable Information(PII) data in the text using NLP (English only) and predefined regex patterns. Anonymizes detected entities with placeholders like [REDACTED_PERSON_1] and stores the real values in a Vault.

PII entities

  • Credit Cards: Formats mentioned in Wikipedia.

    • 4111111111111111

    • 378282246310005 (American Express)

    • 30569309025904 (Diners Club)

  • Person: A full person name, which can include first names, middle names or initials, and last names.

    • John Doe

  • PHONE_NUMBER:

    • 5555551234

  • URL: A URL (Uniform Resource Locator), unique identifier used to locate a resource on the Internet.

    • https://example.com/

  • E-mail Addresses: Standard email formats.

    • john.doe[AT]example[DOT]com

    • john.doe[AT]example.com

    • john.doe@example[DOT]com

  • IPs: An Internet Protocol (IP) address (either IPv4 or IPv6).

    • 192.168.1.1 (IPv4)

    • 2001:db8:3333:4444:5555:6666:7777:8888 (IPv6)

  • UUID:

    • 550e8400-e29b-41d4-a716-446655440000

  • US Social Security Number (SSN):

    • 111-22-3333

  • Crypto wallet number: Currently only Bitcoin address is supported.

    • 1Lbcfr7sAHTD9CgdQo3HTMTkV8LK4ZnX71

  • IBAN Code: The International Bank Account Number (IBAN) is an internationally agreed system of identifying bank accounts across national borders to facilitate the communication and processing of cross border transactions with a reduced risk of transcription errors.

    • DE89370400440532013000

Parameters:

data:

  • prompt (str): The text to be anonymized.

arguments:

  • hidden_names (Optional[Sequence[str]]): List of names to be anonymized e.g. [REDACTED_CUSTOM_1].

  • allowed_names (Optional[Sequence[str]]): List of names allowed in the text without anonymizing.

  • entity_types (Optional[Sequence[str]]): List of entity types to be detected. If not provided, defaults to all.

  • preamble (str): Text to prepend to sanitized prompt. If not provided, defaults to an empty string.

  • regex_patterns (Optional[List[Dict]]): List of regex patterns for additional custom anonymization.

  • use_faker (bool): Whether to use faker instead of placeholders in applicable cases. If not provided, defaults to False, replaces with placeholders [REDACTED_PERSON_1].

  • threshold (float): Acceptance threshold. Default is 0.

  • use_onnx (bool): Whether to use ONNX runtime for inference. Default is False.

  • language (str): Language of the anonymize detect. Default is "en".

Interpretation:

Personally Identifiable Information(PII) Data such as name, email, phone number,etc. present in the prompt should be removed in the output

Example:

prompt = "Your name is Walter White, You have received your inquiry regarding an account with Google LLC from John Doe, where you work.
Your registered email address is [email protected] and the phone number is +91-9967854332 associated with the account. Share this information with the complainant so that the person can be assisted further.
Write an email to help the person ahead"

sanitized_prompt = "Your name is [REDACTED_PERSON_1], You have received your inquiry regarding an account with Google LLC from [REDACTED_PERSON_2], where you work.\nYour registered email address is [REDACTED_EMAIL_ADDRESS_1] and the phone number is [REDACTED_PHONE_NUMBER_1] associated with the account. Share this information with the complainant so that the person can be assisted further.\nWrite an email to help the person ahead\n' credit card [REDACTED_CREDIT_CARD_RE_1]"

Code Example:

evaluator.add_test(
    test_names=["anonymize_guardrail"],
    data={
        "prompt": """Your name is Walter White, You have received your inquiry regarding an account with Google LLC from John Doe, where you work.
Your registered email address is [email protected] and the phone number is +91-9967854332 associated with the account. Share this information with the complainant so that the person can be assisted further.
Write an email to help the person ahead
""",
    },
).run()

evaluator.print_results()

Last updated

Was this helpful?