Anonymize
Last updated
Was this helpful?
Last updated
Was this helpful?
Anonymize Personally Identifiable Information(PII) data in the text using NLP (English only) and predefined regex patterns. Anonymizes detected entities with placeholders like [REDACTED_PERSON_1] and stores the real values in a Vault.
PII entities
Credit Cards: Formats mentioned in .
4111111111111111
378282246310005
(American Express)
30569309025904
(Diners Club)
Person: A full person name, which can include first names, middle names or initials, and last names.
John Doe
PHONE_NUMBER:
5555551234
URL: A URL (Uniform Resource Locator), unique identifier used to locate a resource on the Internet.
https://example.com/
E-mail Addresses: Standard email formats.
john.doe@example.com
john.doe[AT]example[DOT]com
john.doe[AT]example.com
john.doe@example[DOT]com
IPs: An Internet Protocol (IP) address (either IPv4 or IPv6).
192.168.1.1
(IPv4)
2001:db8:3333:4444:5555:6666:7777:8888
(IPv6)
UUID:
550e8400-e29b-41d4-a716-446655440000
US Social Security Number (SSN):
111-22-3333
Crypto wallet number: Currently only Bitcoin address is supported.
1Lbcfr7sAHTD9CgdQo3HTMTkV8LK4ZnX71
IBAN Code: The International Bank Account Number (IBAN) is an internationally agreed system of identifying bank accounts across national borders to facilitate the communication and processing of cross border transactions with a reduced risk of transcription errors.
DE89370400440532013000
Parameters:
data:
prompt
(str): The text to be anonymized.
arguments:
hidden_names
(Optional[Sequence[str]]): List of names to be anonymized e.g. [REDACTED_CUSTOM_1].
allowed_names
(Optional[Sequence[str]]): List of names allowed in the text without anonymizing.
entity_types
(Optional[Sequence[str]]): List of entity types to be detected. If not provided, defaults to all.
preamble
(str): Text to prepend to sanitized prompt. If not provided, defaults to an empty string.
regex_patterns
(Optional[List[Dict]]): List of regex patterns for additional custom anonymization.
use_faker
(bool): Whether to use faker instead of placeholders in applicable cases. If not provided, defaults to False, replaces with placeholders [REDACTED_PERSON_1].
threshold
(float): Acceptance threshold. Default is 0.
use_onnx
(bool): Whether to use ONNX runtime for inference. Default is False.
language
(str): Language of the anonymize detect. Default is "en".
Interpretation:
Personally Identifiable Information(PII) Data such as name, email, phone number,etc. present in the prompt should be removed in the output
Example: