Anonymize
Anonymize Personally Identifiable Information(PII) data in the text using NLP (English only) and predefined regex patterns. Anonymizes detected entities with placeholders like [REDACTED_PERSON_1] and stores the real values in a Vault.
PII entities
Credit Cards: Formats mentioned in Wikipedia.
4111111111111111
378282246310005
(American Express)30569309025904
(Diners Club)
Person: A full person name, which can include first names, middle names or initials, and last names.
John Doe
PHONE_NUMBER:
5555551234
URL: A URL (Uniform Resource Locator), unique identifier used to locate a resource on the Internet.
https://example.com/
E-mail Addresses: Standard email formats.
john.doe@example.com
john.doe[AT]example[DOT]com
john.doe[AT]example.com
john.doe@example[DOT]com
IPs: An Internet Protocol (IP) address (either IPv4 or IPv6).
192.168.1.1
(IPv4)2001:db8:3333:4444:5555:6666:7777:8888
(IPv6)
UUID:
550e8400-e29b-41d4-a716-446655440000
US Social Security Number (SSN):
111-22-3333
Crypto wallet number: Currently only Bitcoin address is supported.
1Lbcfr7sAHTD9CgdQo3HTMTkV8LK4ZnX71
IBAN Code: The International Bank Account Number (IBAN) is an internationally agreed system of identifying bank accounts across national borders to facilitate the communication and processing of cross border transactions with a reduced risk of transcription errors.
DE89370400440532013000
Parameters:
data:
prompt
(str): The text to be anonymized.
arguments:
hidden_names
(Optional[Sequence[str]]): List of names to be anonymized e.g. [REDACTED_CUSTOM_1].allowed_names
(Optional[Sequence[str]]): List of names allowed in the text without anonymizing.entity_types
(Optional[Sequence[str]]): List of entity types to be detected. If not provided, defaults to all.preamble
(str): Text to prepend to sanitized prompt. If not provided, defaults to an empty string.regex_patterns
(Optional[List[Dict]]): List of regex patterns for additional custom anonymization.use_faker
(bool): Whether to use faker instead of placeholders in applicable cases. If not provided, defaults to False, replaces with placeholders [REDACTED_PERSON_1].threshold
(float): Acceptance threshold. Default is 0.use_onnx
(bool): Whether to use ONNX runtime for inference. Default is False.language
(str): Language of the anonymize detect. Default is "en".
Interpretation:
Personally Identifiable Information(PII) Data such as name, email, phone number,etc. present in the prompt should be removed in the output
Example:
Code Example:
Last updated