Trace Masking Functions

Users can mask keywords and regex patterns using custom Python functions

While RagaAI Catalyst sits on-prem for complete data security, enterprises might want to redact certain information, keywords, or patterns from their traces logged on the system. This can be done using custom Python functions as follows:

  • Prerequisites

    Users need to define the relevant platform keys and Tracer object as usual:

from ragaai_catalyst.tracers import Tracer
from ragaai_catalyst import (
    RagaAICatalyst,
    init_tracing
)

# Initialize RagaAI Catalyst
catalyst = RagaAICatalyst(
    access_key="<your-access-key>",
    secret_key="<your-secret-key>",
    base_url="https://catalyst.raga.ai/api"
)

# Setup tracing
tracer = Tracer(
    project_name="<your-project-name>",
    dataset_name="<your-dataset-name>",
    tracer_type="langchain"
)
  • Defining Custom Masking Functions

    Users can define their custom Python logic to replace certain keywords with a redaction phrase as shown below:

def masking_function(value):
    # Mask specific medical symptoms
    symptoms = ['Fever', 'Cough', 'Headache']
    for symptom in symptoms:
        # Case insensitive replacement using regex
        value = re.sub(rf'\b{symptom}\b', '<REDACTED SYMPTOM>', value, flags=re.IGNORECASE)

    return value

# Another sample masking function
def another_masking_function(value):
    """
    Returns masked strings with dates and emails redacted
    """
    # Mask dates in various formats (YYYY-MM-DD, MM/DD/YYYY, etc.)
    value = re.sub(r'\b\d{4}-\d{2}-\d{2}\b', '<REDACTED DATE>', value)
    value = re.sub(r'\b\d{1,2}/\d{1,2}/\d{4}\b', '<REDACTED DATE>', value)
    value = re.sub(r'\b[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}\b', '< REDACTED EMAIL ADDRESS>', value)
    return value
  • Enabling Masking Function

    Lastly, users can pass the desired masking function (limited to one) to the initialized Tracer object using the register_masking_function method as follows:

tracer.register_masking_function(masking_function)
init_tracing(catalyst=catalyst, tracer=tracer)
  • Running Your Application

    Your RAG application can be defined further as usual. Any LLM calls should be traced with the above masking logic applied.

Last updated

Was this helpful?