Trace Masking Functions
Users can mask keywords and regex patterns using custom Python functions
While RagaAI Catalyst sits on-prem for complete data security, enterprises might want to redact certain information, keywords, or patterns from their traces logged on the system. This can be done using custom Python functions as follows:
Prerequisites
Users need to define the relevant platform keys and Tracer object as usual:
from ragaai_catalyst.tracers import Tracer
from ragaai_catalyst import (
RagaAICatalyst,
init_tracing
)
# Initialize RagaAI Catalyst
catalyst = RagaAICatalyst(
access_key="<your-access-key>",
secret_key="<your-secret-key>",
base_url="https://catalyst.raga.ai/api"
)
# Setup tracing
tracer = Tracer(
project_name="<your-project-name>",
dataset_name="<your-dataset-name>",
tracer_type="langchain"
)
Defining Custom Masking Functions
Users can define their custom Python logic to replace certain keywords with a redaction phrase as shown below:
def masking_function(value):
# Mask specific medical symptoms
symptoms = ['Fever', 'Cough', 'Headache']
for symptom in symptoms:
# Case insensitive replacement using regex
value = re.sub(rf'\b{symptom}\b', '<REDACTED SYMPTOM>', value, flags=re.IGNORECASE)
return value
# Another sample masking function
def another_masking_function(value):
"""
Returns masked strings with dates and emails redacted
"""
# Mask dates in various formats (YYYY-MM-DD, MM/DD/YYYY, etc.)
value = re.sub(r'\b\d{4}-\d{2}-\d{2}\b', '<REDACTED DATE>', value)
value = re.sub(r'\b\d{1,2}/\d{1,2}/\d{4}\b', '<REDACTED DATE>', value)
value = re.sub(r'\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}\b', '< REDACTED EMAIL ADDRESS>', value)
return value
Enabling Masking Function
Lastly, users can pass the desired masking function (limited to one) to the initialized Tracer object using the
register_masking_function
method as follows:
tracer.register_masking_function(masking_function)
init_tracing(catalyst=catalyst, tracer=tracer)
Running Your Application
Your RAG application can be defined further as usual. Any LLM calls should be traced with the above masking logic applied.
Last updated
Was this helpful?