Anonymize

Use anonymization guardrails to hide sensitive data. Protect privacy while maintaining analysis accuracy.

Anonymize Personally Identifiable Information(PII) data in the text using NLP (English only) and predefined regex patterns. Anonymizes detected entities with placeholders like [REDACTED_PERSON_1] and stores the real values in a Vault.

PII entities

  • Credit Cards: Formats mentioned in Wikipediaarrow-up-right.

    • 4111111111111111

    • 378282246310005 (American Express)

    • 30569309025904 (Diners Club)

  • Person: A full person name, which can include first names, middle names or initials, and last names.

    • John Doe

  • PHONE_NUMBER:

    • 5555551234

  • URL: A URL (Uniform Resource Locator), unique identifier used to locate a resource on the Internet.

    • https://example.com/

  • E-mail Addresses: Standard email formats.

    • john.doe[AT]example[DOT]com

    • john.doe[AT]example.com

    • john.doe@example[DOT]com

  • IPs: An Internet Protocol (IP) address (either IPv4 or IPv6).

    • 192.168.1.1 (IPv4)

    • 2001:db8:3333:4444:5555:6666:7777:8888 (IPv6)

  • UUID:

    • 550e8400-e29b-41d4-a716-446655440000

  • US Social Security Number (SSN):

    • 111-22-3333

  • Crypto wallet number: Currently only Bitcoin address is supported.

    • 1Lbcfr7sAHTD9CgdQo3HTMTkV8LK4ZnX71

  • IBAN Code: The International Bank Account Number (IBAN) is an internationally agreed system of identifying bank accounts across national borders to facilitate the communication and processing of cross border transactions with a reduced risk of transcription errors.

    • DE89370400440532013000

Parameters:

data:

  • prompt (str): The text to be anonymized.

arguments:

  • hidden_names (Optional[Sequence[str]]): List of names to be anonymized e.g. [REDACTED_CUSTOM_1].

  • allowed_names (Optional[Sequence[str]]): List of names allowed in the text without anonymizing.

  • entity_types (Optional[Sequence[str]]): List of entity types to be detected. If not provided, defaults to all.

  • preamble (str): Text to prepend to sanitized prompt. If not provided, defaults to an empty string.

  • regex_patterns (Optional[List[Dict]]): List of regex patterns for additional custom anonymization.

  • use_faker (bool): Whether to use faker instead of placeholders in applicable cases. If not provided, defaults to False, replaces with placeholders [REDACTED_PERSON_1].

  • threshold (float): Acceptance threshold. Default is 0.

  • use_onnx (bool): Whether to use ONNX runtime for inference. Default is False.

  • language (str): Language of the anonymize detect. Default is "en".

Interpretation:

Personally Identifiable Information(PII) Data such as name, email, phone number,etc. present in the prompt should be removed in the output

Example:

Code Example:

Last updated

Was this helpful?