Model

Dictionaries

Dictionaries allow us to reuse values from previous transformations, not only between different jobs, but also in the same execution. This allows us to keep consistency in the resulting dataset, a given input will always produce the same output value.

Dictionaries are common to all taps within the same project. Therefore, we will be able to transform data consistently across multiple databases.

When running or editing a rule, the dictionary usage options will show up in the configuration right panel.

How to use

Depending on the criteria we choose, we will obtain different results depending on the dictionary configuration we set. These are the available ones:

Dictionary Modes

There are several modes that control how dictionary values are reused:

Dictionary Modes Summary

The following table summarizes all available dictionary modes, when to use them, and links to their detailed explanations:

ModeDescriptionWhen to UseLink
FieldSame entity and field will be transformed in the same wayWhen you need different transformations for the same value in different fieldsField Mode
LabelFields labelled with the same label will be transformed in the same wayWhen you want consistent anonymization across related data types (default mode)Label Mode
Field NameValues with the same field name across different entities will be transformed consistentlyWhen fields with identical names should have consistent transformations regardless of labelsField Name Mode
GlobalAll values stored in the dictionary will be reused regardless of entity, field, or labelWhen you want all identical values in your project to transform consistentlyGlobal Mode
RowUses row-based scoping for transformationsWhen values within the same row should be consistent but different across rowsRow Mode
RelatedUses relationship-based scoping for transformationsWhen you want consistency across related tables according to defined relationshipsRelated Mode
User ScopeAllows you to define custom scoping for dictionary transformationsWhen you need precise control over which specific fields share transformation consistencyUser Scope Mode
DisabledDictionary is disabled - transformations will not be stored or reusedWhen you want maximum randomness and no consistency in transformationsDisabled Mode

Field Mode

Matches in the same entity and field will be transformed in the same way. Even if there are matches in other entities or fields, they will be ignored.

In Field mode, the dictionary scope is limited to the specific combination of entity and field. This means that if the same value appears in different fields or entities, it will be transformed differently each time.

Example: If you have a "Customers" table with a "Name" field and an "Orders" table with a "Name" field, the value "John" in Customers/Name will be transformed differently than "John" in Orders/Name.

Visual representation:

Customers Table             Orders Table
┌─────────────┐             ┌─────────────┐
│ Name        │             │ Name        │
├─────────────┤             ├─────────────┤
│ John        │ ──────────▶ │ Mark        │  (Different transformation)
└─────────────┘             └─────────────┘

┌─────────────┐             ┌─────────────┐
│ Name        │             │ Name        │
├─────────────┤             ├─────────────┤
│ John        │ ──────────▶ │ Sarah       │  (Different transformation)
└─────────────┘             └─────────────┘

In Field Mode, each entity-field combination maintains its own dictionary scope, so the same input value can produce different outputs depending on where it appears.

To use this mode, select "Field" from the dictionary mode dropdown in the UI.

Label Mode (Default)

Matches on fields labelled with the same label will be transformed in the same way. This is the most common mode as it ensures consistent anonymization across related data types.

In Label mode, the dictionary scope is based on field labels. This means that if two fields have the same label (e.g., "person/name"), they will consistently transform identical values to the same output, regardless of which entity they belong to.

Example: If you have a "Customers" table with a "Name" field labelled as "person/name" and an "Employees" table with a "Full Name" field also labelled as "person/name", the value "John Smith" will be transformed to the same anonymized value in both fields.

Visual representation:

Customers Table             Employees Table
┌────────────────┐          ┌─────────────────┐
│ Name           │          │ Full Name       │
│ [person/name]  │          │ [person/name]   │
├────────────────┤          ├─────────────────┤
│ John Smith     │ ────────▶ │ John Smith      │
└────────────────┘          └─────────────────┘
       │                           │
       ▼                           ▼
┌────────────────┐          ┌─────────────────┐
│ Mark Johnson   │ ◀──────── │ Mark Johnson    │  (Same transformation)
└────────────────┘          ┌─────────────────┐
                           │ (Dictionary     │
                           │  scope shared)  │
                           └─────────────────┘

In Label Mode, fields with the same label share a dictionary scope ensuring consistent transformation across different entities.

To use this mode, select "Label" from the dictionary mode dropdown in the UI.

Field Name Mode

Values with the same field name across different entities will be transformed consistently.

In Field Name mode, the dictionary scope is based on the field name itself, regardless of which entity it belongs to. This means that fields with identical names (e.g., "email") across different tables will transform the same values consistently.

Example: If you have an "email" field in a "Users" table and an "email" field in a "Subscribers" table, the value "john@example.com" will be transformed to the same anonymized value in both fields, even if they don't share the same labels.

Visual representation:

Users Table             Subscribers Table
┌────────────────────┐   ┌────────────────────┐
│ email              │   │ email              │
├────────────────────┤   ├────────────────────┤
│ john@example.com   │ ─▶ │ john@example.com   │
└────────────────────┘   └────────────────────┘
        │                       │
        ▼                       ▼
┌────────────────────┐   ┌────────────────────┐
│ user123@email.com  │ ◀─ │ user123@email.com  │ (Same transformation)
└────────────────────┘   └────────────────────┘
        (Dictionary scope shared based on field name)

In Field Name Mode, any fields with the same name share a dictionary scope regardless of their entity, ensuring consistent transformation of identical field names across different tables.

To use this mode, select "Field Name" from the dictionary mode dropdown in the UI.

Global Mode

All values that have already been stored in the dictionary will be reused regardless of the entity and field in which they have been found or the label assigned to it.

In Global mode, there is a single shared dictionary scope for all transformations in your project. This means that once a value is transformed anywhere in your project, all subsequent occurrences of that same value will be transformed identically, regardless of entity, field, or label.

Example: If "John Smith" is first transformed to "Mark Johnson" in a "Customers" table "Name" field, then all future occurrences of "John Smith" anywhere in your project (whether in "Employees" table "Full Name" field or "Orders" table "Customer Name" field) will also be transformed to "Mark Johnson".

Visual representation:

Project Dictionary Scope (Global)
┌──────────────────────────────┐
│ John Smith ──▶ Mark Johnson  │
└──────────────────────────────┘



Customers Table        Employees Table        Orders Table
┌──────────────┐       ┌──────────────┐       ┌──────────────┐
│ Name         │       │ Full Name    │       │ Customer Name│
├──────────────┤       ├──────────────┤       ├──────────────┤
│ John Smith   │ ──▶   │ John Smith   │ ──▶   │ John Smith   │
└──────────────┘       └──────────────┘       └──────────────┘
       │                       │                       │
       ▼                       ▼                       ▼
┌──────────────┐       ┌──────────────┐       ┌──────────────┐
│ Mark Johnson │ ◀──   │ Mark Johnson │ ◀──   │ Mark Johnson │
└──────────────┘       └──────────────┘       └──────────────┘

In Global Mode, there is a single dictionary scope for the entire project. Once a value is transformed, all identical values throughout your project will be transformed to the same output, regardless of entity, field, or label.

To use this mode, select "Global" from the dictionary mode dropdown in the UI.

Row Mode

Uses row-based scoping for transformations.

In Row mode, the dictionary scope is based on entire rows of data. This means that transformations within the same row will be consistent, but the same value in different rows may be transformed differently.

Example: If you have a row with "John Smith" in the "Name" field and "123 Main St" in the "Address" field, both values will be consistently transformed within that row. However, "John Smith" in a different row might be transformed to a different value.

Visual representation:

Customer Table (Row Mode)
┌──────────────────────────────────────┐
│ Row ID │ Name        │ Address      │
├────────┼─────────────┼──────────────┤
│ 1      │ John Smith  │ 123 Main St  │ ◀─ Same row, consistent transformation
│ 2      │ John Smith  │ 456 Oak Ave  │ ◀─ Different row, different transformation
└────────┴─────────────┴──────────────┘
         │             │
         ▼             ▼
┌──────────────────────────────────────┐
│ Row ID │ Name        │ Address      │
├────────┼─────────────┼──────────────┤
│ 1      │ Mark Johnson│ 789 Pine Rd  │ ◀─ Values within same row are consistent
│ 2      │ Sarah Williams│ 456 Oak Ave│ ◀─ Same input, different output due to row scope
└────────┴─────────────┴──────────────┘

In Row Mode, dictionary scope is limited to each individual row. Values within the same row are transformed consistently, but identical values in different rows may have different transformations.

To use this mode, select "Row" from the dictionary mode dropdown in the UI.

Uses relationship-based scoping for transformations.

In Related mode, the dictionary scope is based on relationships between entities in your data model. This ensures that values maintain consistency across related tables according to their defined relationships.

Example: If you have a "Customers" table and an "Orders" table with a relationship between them, customer names will be consistently transformed across both tables based on their relationship. When "John Smith" is transformed in the Customers table, any related orders will reference the same transformed name.

Visual representation:

Customers Table           Orders Table
┌─────────────────┐       ┌──────────────────┐
│ ID │ Name       │       │ OrderID │ CustID │
├────┼────────────┤       ├─────────┼────────┤
│ 1  │ John Smith │ ◀──── │ 101     │ 1      │
│ 2  │ Jane Doe   │ ◀──── │ 102     │ 2      │
└────┴────────────┘       └─────────┴────────┘
     │                         │
     ▼                         ▼
┌─────────────────┐       ┌──────────────────┐
│ ID │ Name       │       │ OrderID │ CustID │
├────┼────────────┤       ├─────────┼────────┤
│ 1  │ Mark Johnson │       │ 101     │ 1      │
│ 2  │ Sarah Williams │       │ 102     │ 2      │
└────┴────────────┘       └─────────┴────────┘


                      ┌────────────────────┐
                      │ Customer Name      │
                      ├────────────────────┤
                      │ Mark Johnson       │ ◀─ Consistent with Customers table
                      │ Sarah Williams     │ ◀─ Consistent with Customers table
                      └────────────────────┘

In Related Mode, dictionary scope follows defined relationships between entities. When a value is transformed in one table, all related occurrences in connected tables will use the same transformation.

To use this mode, select "Related" from the dictionary mode dropdown in the UI.

User Scope Mode

Allows you to define custom scoping for dictionary transformations.

In User Scope mode, you can create your own custom dictionary scopes that can be shared across specific fields of your choice. This gives you precise control over which fields should share transformation consistency.

Example: You might create a custom scope called "customer-identifiers" and apply it to fields like "Customer Name", "Billing Contact", and "Shipping Contact". This ensures that identical names in these fields are consistently transformed, while other fields use different scopes.

Visual representation:

User Defined Scope: "customer-identifiers"
┌────────────────────────────────────────────────┐
│ Customer Name     Billing Contact   Shipping Contact
├────────────────────────────────────────────────┤
│ John Smith        John Smith        John Smith   │
└────────────────────────────────────────────────┘
       │                 │                │
       ▼                 ▼                ▼
┌────────────────────────────────────────────────┐
│ Mark Johnson      Mark Johnson      Mark Johnson │ ◀─ Consistent transformation
└────────────────────────────────────────────────┘

Other Fields (Different Scopes)
┌────────────────┐    ┌─────────────────┐
│ Email          │    │ User ID         │
├────────────────┤    ├─────────────────┤
│ John Smith     │    │ John Smith      │
│ (different     │    │ (different      │
│  transformation)│   │  transformation)│
└────────────────┘    └─────────────────┘

In User Scope Mode, you define custom scopes that can span across any fields you choose, providing precise control over which values should share consistent transformations.

To use this mode, select "User Scope" from the dictionary mode dropdown and enter a custom scope name in the scope text field.

Disabled Mode

Dictionary is disabled - transformations will not be stored or reused.

In Disabled mode, each transformation is generated independently without checking or storing results in the dictionary. This means that identical values may be transformed differently each time they appear, providing maximum randomness but no consistency.

Example: If the name "John Smith" appears multiple times in your data, it might be transformed to "Mark Johnson", "Sarah Williams", and "Robert Davis" in different occurrences.

Visual representation:

Disabled Mode - No Dictionary Scope
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│ Customer Name   │    │ Employee Name   │    │ Order Name      │
├─────────────────┤    ├─────────────────┤    ├─────────────────┤
│ John Smith      │    │ John Smith      │    │ John Smith      │
└─────────────────┘    └─────────────────┘    └─────────────────┘
       │                      │                      │
       ▼                      ▼                      ▼
┌─────────────────┐    ┌─────────────────┐    ┌─────────────────┐
│ Random Output 1 │    │ Random Output 2 │    │ Random Output 3 │
├─────────────────┤    ├─────────────────┤    ├─────────────────┤
│ Mark Johnson    │    │ Sarah Williams  │    │ Robert Davis    │
└─────────────────┘    └─────────────────┘    └─────────────────┘

In Disabled Mode, each transformation is generated independently without any dictionary scope, meaning identical input values can produce completely different output values each time they appear.

To use this mode, select "Disabled" from the dictionary mode dropdown in the UI.

Reuse in the same entity and field

Real data

idNameLast name
1EdithUpton
2KeithSmith
3EdithSmith

Anonymized data

idNameLast name
1GlendaLeannon
2FelixReynolds
3ValerieBlock

As we can see, although Edith was previously anonymized with Glenda, the second time it appears it becomes another value (Valerie). The same happens with the last name Smith.

Save new transformations in the dictionary

If this option is enabled, transformations are stored and can be used in the next jobs.

If this option is not active, transformations carried out during the job will be deleted, so that they will only take effect during the job itself.

Overwrite current dictionary

If this option is active, the current project dictionary will be cleared before the rule is run, so no previously stored values will be reused.

Gigantic does not store any source data in its database. We use a cryptographic function to hash the entries. Therefore, it is impossible to revert the process to get the original data back.

Dictionary Configuration in Transform Functions

When configuring dictionary behavior in your transformation functions, you can use several options in the UI to control how values are stored and reused:

Mode Options

The mode dropdown determines the scope in which dictionary values are stored and reused:

  • Field: Scopes transformations to the specific entity and field combination
  • Label: Scopes transformations to fields with the same label
  • Field Name: Scopes transformations to fields with the same name across different entities
  • Global: Uses a single global scope for all transformations
  • Row: Scopes transformations to entity rows
  • Related: Uses relationship-based scoping
  • User Scope: Allows custom scoping with a user-defined scope name
  • Disabled: Disables dictionary functionality

Replace Label

The Replace Label text field allows you to override the automatic label detection for a field. This is useful when you want a field to be treated as a different data type for dictionary purposes.

For example, if you have a field that contains company names but you want them to be treated as person names for consistency with other name fields, you can enter "person/name" in the Replace Label field.

With Options

The With Options checkbox includes the function options in the dictionary key, ensuring that identical values with different transformation parameters are stored separately. When this option is enabled, the dictionary will create unique entries for the same input value if it's being transformed with different function parameters.

Example: If you have a Name field being transformed with a person function, and you use the same input "John Smith" but with different options (e.g., one with gender: 'male' and another with gender: 'female'), enabling "With Options" will store these as separate dictionary entries, allowing for different transformations based on the options provided.

To use this option, check the "With Options" checkbox in the dictionary configuration panel. This is particularly useful when you want to maintain different transformations for the same value based on context-specific parameters.

Nulls Handling

The Nulls Handling checkbox controls whether null values should be stored in and retrieved from the dictionary. When enabled, null values will be consistently transformed to the same output value each time they appear in your data. This can be useful for maintaining consistency in your anonymized datasets.

When disabled (default), null values are not stored in the dictionary and will be processed independently each time they are encountered, potentially resulting in different transformations or handling behaviors.

Example: With Nulls Handling enabled, all NULL values in a "Last Name" field might be consistently transformed to "Anonymous", whereas with it disabled, they might be left as NULL or handled differently based on the transformation function.

To use this option, check the "Nulls Handling" checkbox in the dictionary configuration panel.

Technical Implementation

Dictionary transformations work by:

  1. Hashing the original value: When a value is to be transformed, it's first hashed using a cryptographic hash function (SHA-256). This ensures that the original value is never stored in plain text in the dictionary, maintaining security and privacy.

  2. Creating a scope identifier: Based on the selected mode (Field, Label, Global, etc.), a unique scope identifier is generated. This identifier determines where in the dictionary the transformation should be stored and retrieved from.

  3. Checking for existing transformations: The system checks if a transformation already exists for that hash within the specified scope. This lookup is performed in constant time for efficiency.

  4. Returning existing transformations: If a matching hash is found in the specified scope, the previously generated transformed value is returned, ensuring consistency.

  5. Generating new transformations: If no matching hash is found, a new transformation is generated by the appropriate function, and both the hash and transformed value are stored in the dictionary for future reuse.

This implementation ensures that:

  • Values are consistently transformed based on your chosen scope
  • Original data is never stored in plain text
  • Dictionary lookups are efficient
  • Multiple projects can maintain separate dictionaries
  • The same system works across different databases and data sources within a project