Anonymize
Use anonymization guardrails to hide sensitive data. Protect privacy while maintaining analysis accuracy.
Anonymize Personally Identifiable Information(PII) data in the text using NLP (English only) and predefined regex patterns. Anonymizes detected entities with placeholders like [REDACTED_PERSON_1] and stores the real values in a Vault.
PII entities
Credit Cards: Formats mentioned in Wikipedia.
4111111111111111
378282246310005
(American Express)30569309025904
(Diners Club)
Person: A full person name, which can include first names, middle names or initials, and last names.
John Doe
PHONE_NUMBER:
5555551234
URL: A URL (Uniform Resource Locator), unique identifier used to locate a resource on the Internet.
https://example.com/
E-mail Addresses: Standard email formats.
john.doe[AT]example[DOT]com
john.doe[AT]example.com
john.doe@example[DOT]com
IPs: An Internet Protocol (IP) address (either IPv4 or IPv6).
192.168.1.1
(IPv4)2001:db8:3333:4444:5555:6666:7777:8888
(IPv6)
UUID:
550e8400-e29b-41d4-a716-446655440000
US Social Security Number (SSN):
111-22-3333
Crypto wallet number: Currently only Bitcoin address is supported.
1Lbcfr7sAHTD9CgdQo3HTMTkV8LK4ZnX71
IBAN Code: The International Bank Account Number (IBAN) is an internationally agreed system of identifying bank accounts across national borders to facilitate the communication and processing of cross border transactions with a reduced risk of transcription errors.
DE89370400440532013000
Parameters:
data:
prompt
(str): The text to be anonymized.
arguments:
hidden_names
(Optional[Sequence[str]]): List of names to be anonymized e.g. [REDACTED_CUSTOM_1].allowed_names
(Optional[Sequence[str]]): List of names allowed in the text without anonymizing.entity_types
(Optional[Sequence[str]]): List of entity types to be detected. If not provided, defaults to all.preamble
(str): Text to prepend to sanitized prompt. If not provided, defaults to an empty string.regex_patterns
(Optional[List[Dict]]): List of regex patterns for additional custom anonymization.use_faker
(bool): Whether to use faker instead of placeholders in applicable cases. If not provided, defaults to False, replaces with placeholders [REDACTED_PERSON_1].threshold
(float): Acceptance threshold. Default is 0.use_onnx
(bool): Whether to use ONNX runtime for inference. Default is False.language
(str): Language of the anonymize detect. Default is "en".
Interpretation:
Personally Identifiable Information(PII) Data such as name, email, phone number,etc. present in the prompt should be removed in the output
Example:
prompt = "Your name is Walter White, You have received your inquiry regarding an account with Google LLC from John Doe, where you work.
Your registered email address is [email protected] and the phone number is +91-9967854332 associated with the account. Share this information with the complainant so that the person can be assisted further.
Write an email to help the person ahead"
sanitized_prompt = "Your name is [REDACTED_PERSON_1], You have received your inquiry regarding an account with Google LLC from [REDACTED_PERSON_2], where you work.\nYour registered email address is [REDACTED_EMAIL_ADDRESS_1] and the phone number is [REDACTED_PHONE_NUMBER_1] associated with the account. Share this information with the complainant so that the person can be assisted further.\nWrite an email to help the person ahead\n' credit card [REDACTED_CREDIT_CARD_RE_1]"
Code Example:
evaluator.add_test(
test_names=["anonymize_guardrail"],
data={
"prompt": """Your name is Walter White, You have received your inquiry regarding an account with Google LLC from John Doe, where you work.
Your registered email address is [email protected] and the phone number is +91-9967854332 associated with the account. Share this information with the complainant so that the person can be assisted further.
Write an email to help the person ahead
""",
},
).run()
evaluator.print_results()
Last updated
Was this helpful?