Anonymize
The Anonymize operation allows you to protect sensitive data by replacing original values with anonymized ones. This operation is applied during the transformation phase of the pipeline.
Overview
The Anonymize operation enables you to:
- Protect sensitive personally identifiable information (PII)
- Replace original data with realistic fake data
- Apply different anonymization techniques to different fields
- Maintain data utility while ensuring privacy compliance
- Work with various data types (strings, numbers, dates)
Configuration
Field-Level Anonymization
The Anonymize operation is configured at the field level, allowing you to specify different anonymization methods for each sensitive field:
No action: Keep the original values unchanged (useful for testing or when certain fields don't need anonymization)
Fake data: Replace values with realistic fake data based on field labels. For example, a field labeled as "name" would be replaced with fake names, while a field labeled as "email" would be replaced with fake email addresses.
Masking: Replace parts of values with mask characters while preserving format. For example, a credit card number "1234-5678-9012-3456" might become "--****-3456".
Shuffling: Randomly reorder values within the dataset while maintaining the same value distribution.
List: Replace values by picking randomly from a predefined list of values.
Custom function: Write your own anonymization function using JavaScript code.
Saved function: Use a previously created and saved custom function.
Delete field: Completely remove the field from the output dataset.
Blank field: Replace all values with null/empty values.
Dictionary Modes
When anonymizing data, you can control how replacement values are mapped:
Inherit from rule: Use the default dictionary behavior defined at the rule level.
Skip dictionary: Don't maintain consistent mapping between original and replacement values.
Label scope: Maintain consistent mapping within fields that have the same label.
Fieldname scope: Maintain consistent mapping within fields that have the same name.
Entity/Field scope: Maintain consistent mapping within the same entity and field combination.
Global scope: Maintain consistent mapping across all entities and fields.
User-defined scope: Define your own scope for consistent mapping using a custom scope string.
Examples
Basic Anonymization
To anonymize customer data:
- Run a discover operation first to identify sensitive fields
- Select the customer entity
- For the "name" field, choose "Fake data" with "name" label
- For the "email" field, choose "Fake data" with "email" label
- For the "phone" field, choose "Masking" to preserve format while hiding real numbers
Consistent Anonymization
To ensure the same customer name always gets replaced with the same fake name:
- Select "Label scope" dictionary mode for name fields
- This ensures that whenever "John Smith" appears in any field labeled as "name", it will always be replaced with the same fake name like "Jane Doe"
Custom Anonymization Function
To apply a custom anonymization algorithm:
- Select a field and choose "Custom function"
- Write JavaScript code that takes the original value and returns an anonymized version
This operation helps ensure your data complies with privacy regulations while maintaining realistic data characteristics for testing and development purposes.