Operations

Anonymize

The Anonymize operation allows you to protect sensitive data by replacing original values with anonymized ones. This operation is applied during the transformation phase of the pipeline.

Overview

The Anonymize operation enables you to:

  • Protect sensitive personally identifiable information (PII)
  • Replace original data with realistic fake data
  • Apply different anonymization techniques to different fields
  • Maintain data utility while ensuring privacy compliance
  • Work with various data types (strings, numbers, dates)

Configuration

Field-Level Anonymization

The Anonymize operation is configured at the field level, allowing you to specify different anonymization methods for each sensitive field. When configuring field-level anonymization, you'll see a pen icon (✎) in the actions column that opens the Edit Function panel for more detailed configuration.

Each field can be configured with one of the following anonymization methods organized in the dropdown menu:

Use Fake Data

Replace values with realistic fake data based on field labels. When selecting this option, a label dropdown appears that lets the end user choose the label standard fake data generator. This dropdown contains various labels categorized by type (Language, Date, Global, Custom) that determine what kind of fake data will be generated. For example, a field labeled as "person/name" would be replaced with fake names, while a field labeled as "contact/email" would be replaced with fake email addresses.

Fake data options

  • Language-based labels (e.g., "person/name", "contact/email")
  • Date format labels (e.g., "date/yyyy-mm-dd")
  • Global labels (e.g., "global/url")
  • Custom labels (e.g., "custom/IBAN")

Functions

Built-in anonymization functions that can be applied to fields.

Masking

Replace parts of values with mask characters while preserving format. For example, a credit card number "1234-5678-9012-3456" might become "--****-3456".

Shuffling

Randomly reorder values within the dataset while maintaining the same value distribution.

List

Replace values by picking randomly from a predefined list of values.

Delete field

Completely remove the field from the output dataset.

Blank field

Replace all values with null/empty values.

Saved Functions

Use a previously created and saved custom function. These saved functions come from your Project Functions which can be reused across different models within the same project.

Custom Function

Write your own anonymization function using JavaScript code. For more information on creating custom functions, see Custom Functions.

No Action

Keep the original values unchanged (useful for testing or when certain fields don't need anonymization).

Edit Function Options

When you click the pen icon (✎) for a field with the "Fake data" anonymization method, you'll see several configuration options:

Locale: Specify the locale to use for generating fake data. This affects the cultural characteristics of the generated data such as names, addresses, and phone numbers. For example, using locale "es-ES" will generate Spanish names and addresses, while "en-US" will generate American ones. The locale is automatically set based on the selected label's locale but can be overridden.

Text Format: Control the format of the generated fake data. Options include:

  • None: Keep the original formatting from the generator
  • UPPERCASE: Convert all text to uppercase
  • lowercase: Convert all text to lowercase
  • Title Case: Capitalize the first letter of each word
  • Snake_case: Convert spaces to underscores
  • Kebab-case: Convert spaces to hyphens

Prefix: Add a custom prefix to all generated fake data values. Enable the prefix option with the checkbox, then enter your desired prefix in the text field. For example, with prefix "TEST_" a generated name "John Doe" would become "TEST_John Doe".

Suffix: Add a custom suffix to all generated fake data values. Enable the suffix option with the checkbox, then enter your desired suffix in the text field. For example, with suffix "_USER" a generated name "John Doe" would become "John Doe_USER".

Dictionary: Control how replacement values are mapped and reused. This option determines the scope in which generated values are stored and reused for consistency. For detailed information about dictionary modes, see Dictionary Functions.

Dictionary Modes

When anonymizing data, you can control how replacement values are mapped using different dictionary modes:

Inherit from rule

Use the default dictionary behavior defined at the rule level.

Skip dictionary

Don't maintain consistent mapping between original and replacement values.

Label scope

Maintain consistent mapping within fields that have the same label.

Fieldname scope

Maintain consistent mapping within fields that have the same name.

Entity/Field scope

Maintain consistent mapping within the same entity and field combination.

Global scope

Maintain consistent mapping across all entities and fields.

User-defined scope

Define your own scope for consistent mapping using a custom scope string. When selected, you can specify a custom scope name in the provided text field.

Examples

Basic Anonymization

To anonymize customer data:

  1. Run a discover operation first to identify sensitive fields
  2. Select the customer entity
  3. For the "name" field, choose "Fake data" with "name" label
  4. For the "email" field, choose "Fake data" with "email" label
  5. For the "phone" field, choose "Masking" to preserve format while hiding real numbers

Consistent Anonymization

To ensure the same customer name always gets replaced with the same fake name:

  1. Select "Label scope" dictionary mode for name fields
  2. This ensures that whenever "John Smith" appears in any field labeled as "name", it will always be replaced with the same fake name like "Jane Doe"

Custom Anonymization Function

To apply a custom anonymization algorithm:

  1. Select a field and choose "Custom function"
  2. Write JavaScript code that takes the original value and returns an anonymized version

This operation helps ensure your data complies with privacy regulations while maintaining realistic data characteristics for testing and development purposes.