Projects

Dictionary

What is a Dictionary?

A Dictionary in Gigantics is a smart storage system that remembers how you've transformed data values. Think of it as a memory system that ensures consistent anonymization across your entire project.

Why Dictionaries Matter

When you anonymize data, you often need the same original value to become the same anonymized value every time it appears. This is called referential integrity - maintaining relationships between data even after anonymization.

Example Scenario:

Original Data:
- Customer "John Smith" with email "john@example.com"
- Order placed by "John Smith"
- Invoice sent to "john@example.com"

Without Dictionary:
- Customer name - "Mark Johnson"
- Order customer - "Sarah Williams"  - Different!
- Invoice email - "sarah@email.com"  - Different!

With Dictionary (Label Mode):
- Customer name - "Mark Johnson"
- Order customer - "Mark Johnson"     - Consistent!
- Invoice email - "mark@email.com"   - Consistent!

Dictionaries ensure that "John Smith" always becomes "Mark Johnson" and "john@example.com" always becomes "mark@email.com" throughout your entire project, maintaining data relationships and consistency.

Where Dictionaries Are Used in Gigantics

Dictionaries are integrated throughout the Gigantics application in several key areas:

1. Project Dictionary Page

The main dictionary management interface is located at:

Navigation Path:

Projects → [Your Project] → Dictionary

This is where you can:

  • View all dictionary entries
  • Import/Export dictionaries
  • Search and filter entries
  • Clear the entire dictionary
  • View summary statistics by scope

2. Rule Configuration

Dictionaries are configured when creating or editing Rules and Pipelines:

Navigation Path:

Model → Rules → [Create/Edit Rule] → Default Options → Dictionary

Here you configure:

  • Dictionary mode (Field, Label, Global, None)
  • Cache dictionary option
  • Store new transformations
  • Overwrite existing dictionary

3. Field-Level Transformations

When configuring individual field transformations in anonymize operations:

Navigation Path:

Model → Rules → [Edit Rule] → [Select Field] → Transform Options → Dictionary

Each field can have its own dictionary settings:

  • Dictionary mode override
  • Replace label option
  • Custom scope definition
  • With options toggle
  • Nulls handling

4. Pipeline Configuration

When setting up automated pipelines:

Navigation Path:

Model → Pipelines → [Create/Edit Pipeline] → Dictionary Options

Pipelines inherit dictionary settings that will be used for all job executions.

Dictionary UI Components

Dictionary Main Page

The dictionary page (/projects/dictionary) provides a comprehensive interface for managing dictionary entries: for managing dictionary entries:

┌────────────────────────────────────────────────────────────────────────────┐
│ Dictionary                                                        [Search] │
│ Dictionary entries of this project                              [Sort: ▼]  │
├────────────────────────────────────────────────────────────────────────────┤
│ [View Summary] [Export] [Import] [Clear Dictionary]                        │
├────────────────────────────────────────────────────────────────────────────┤
│ Key                  New Value              Scope                          │
│ ───────────────────────────────────────────────────────────────────────────│
│ abc123def456...      Mark Johnson           person/name                    │
│ def789ghi012...      mark@email.com         email                          │
│ jkl345mno678...      Company XYZ             org/name                      │
│ ...                  ...                     ...                           │
├────────────────────────────────────────────────────────────────────────────┤
│          [« Prev] [Next »]                                                 │
└────────────────────────────────────────────────────────────────────────────┘

Toolbar Actions:

ButtonIconFunctionWhen to Use
View SummaryEyeShows count of entries by scopeTo get overview of dictionary structure
ExportExportDownloads dictionary as CSV or JSONTo backup or migrate dictionary
ImportImportUploads dictionary from fileTo restore or merge dictionaries
Clear DictionaryClearRemoves all entriesTo start fresh or reset dictionary

Search Functionality:

  • Search by key (MD5 hash)
  • Search by scope name
  • Search by transformed value
  • Real-time filtering as you type

Sorting Options:

  • Sort by Scope (default)
  • Sort by Key
  • Sort by Value

Import Dictionary Modal

┌────────────────────────────────────────┐
│ Import dictionary                      │
├────────────────────────────────────────┤
│                                        │
│    ┌────────────────────────────┐      │
│    │                            │      │
│    │    [Drag & Drop Area]      │      │
│    │                            │      │
│    │    ⇧ Click or drag file    │      │
│    │        Accept: .json       │      │
│    │                            │      │
│    └────────────────────────────┘      │
│                                        │
│  Action:                               │
│  ☑ Append new / replace matching       │
│  ☐ Overwrite entire dictionary         │
│                                        │
│              [Cancel]  [Confirm]       │
└────────────────────────────────────────┘

Import Options:

OptionDescriptionUse Case
AppendAdds new entries, updates matching keysMerging dictionaries or updating specific entries
OverwriteReplaces entire dictionaryRestoring from backup or complete replacement

File Format:

  • JSON format required
  • Each entry must have: key, val, scope

Export Dictionary Modal

┌────────────────────────────────────────┐
│ Export dictionary                      │
├────────────────────────────────────────┤
│                                        │
│  Format:                               │
│  [● CSV ]                              │
│                                        │
│  What would you like to export?        │
│  ☑ Full dictionary                     │
│  ☐ Select scope                        │
│    [Select scope ▼]                    │
│      - person/name                     │
│      - email                           │
│      - org/name                        │
│      ...                               │
│                                        │
│              [Cancel]  [Export]        │
└────────────────────────────────────────┘

Export Options:

OptionDescriptionWhen to Use
Full DictionaryExports all entriesComplete backup or migration
Select ScopeExports specific scopesPartial backup or scope-specific analysis
CSV FormatComma-separated valuesSpreadsheet analysis or external tools
JSON FormatJSON structureProgrammatic use or re-import

Dictionary Summary Modal

Shows a breakdown of dictionary entries by scope:

┌────────────────────────────────────────┐
│ Dictionary Summary                     │
├────────────────────────────────────────┤
│ Scope              Count               │
├────────────────────────────────────────┤
│ person/name        15,234              │
│ email              12,456              │
│ org/name            8,901              │
│ phone              5,678               │
│ address            3,421               │
│ ...                 ...                │
└────────────────────────────────────────┘

Rule Configuration - Dictionary Options

When configuring rules, you'll see the Dictionary section:

┌─────────────────────────────────────────────────────────┐
│ Dictionary                                              │
│ ℹ Indicates whether this rule will                      │
│   make use of values that have                          │
│   been masked in previous executions                    │
│                                                         │
│ Mode:                                                   │
│ ○ No dictionary                                         │
│ ○ Reuse values on the same entity+field                 │
│ ○ Reuse values with the same label or same entity+field │
│ ○ Reuse values in every field                           │
│                                                         │
│ ○ Cache dictionary                                      │
│ ○ Store new transformations in the dictionary           │
│ ○ Overwrite existing dictionary                         │
└─────────────────────────────────────────────────────────┘

Configuration Options:

OptionDescriptionImpact
Mode: NoneDisables dictionary usageMaximum randomness, no consistency
Mode: FieldReuse per entity+field combinationDifferent values in different fields
Mode: LabelReuse per label typeConsistent across same data types
Mode: GlobalReuse everywhereMaximum consistency, single scope
Cache dictionaryStore in memory for faster accessBetter performance, uses more memory
Store new transformationsSave new transformations for future useDictionary grows, enables reuse across jobs
Overwrite existingClear dictionary before job startsFresh start, removes old entries

Field-Level Dictionary Configuration

When configuring individual field transformations:

┌────────────────────────────────────────┐
│ Dictionary                             │
│                                        │
│ Mode: [▼ Label scope]                  │
│   □ Inherit from rule                  │
│   □ Skip dictionary                    │
│   □ Label scope                        │
│   □ Fieldname scope                    │
│   □ Entity/Field scope                 │
│   □ Global scope                       │
│   □ User-defined scope                 │
│                                        │
│ Replace Label: [_________________]     │
│ Scope:         [_________________]     │
│                                        │
│ □ With options                         │
│ □ Nulls handling                       │
└────────────────────────────────────────┘

Field-Level Options:

OptionDescriptionExample Use Case
Inherit from ruleUses rule-level dictionary settingsDefault behavior, consistent with rule
Skip dictionaryBypasses dictionary for this fieldMaximum randomness for sensitive fields
Label scopeUses field's label for scopingStandard consistency within data type
Fieldname scopeUses field name across entitiesConsistent for fields with same name
Entity/Field scopeField-specific scopeDifferent values per field
Global scopeProject-wide consistencyMaximum consistency
User-defined scopeCustom scope nameCustom grouping logic
Replace LabelOverride automatic label detectionTreat field as different type
ScopeCustom scope identifierCustom grouping when using user-defined
With optionsInclude function options in keyDifferent transformations for same value with different params
Nulls handlingStore and reuse null transformationsConsistent null value handling

Dictionary Modes Explained

Mode: None (Disabled)

What it does: Dictionary is completely disabled for this rule or field.

Behavior:

  • No transformations are stored
  • No lookups are performed
  • Each transformation is independent
  • Maximum randomness

When to use:

  • When you want maximum randomization
  • For one-time transformations
  • When consistency is not required
  • Testing or exploration scenarios

Example:

Input: "John Smith"
Run 1: "Mark Johnson"
Run 2: "Sarah Williams"
Run 3: "Robert Davis"

Mode: Field (Entity + Field)

What it does: Reuses transformations within the same entity and field combination only.

Behavior:

  • Same value in same field → same output
  • Same value in different field → different output
  • Same value in different entity → different output

When to use:

  • When fields should have independent transformations
  • When same value means different things in different fields
  • Testing field-specific anonymization

Example:

Customers.Name "John" → "Mark"
Customers.Name "John" → "Mark" (reused)
Orders.Name "John" → "Sarah" (different field)

Mode: Label (Default)

What it does: Reuses transformations for fields with the same label, regardless of entity or field name.

Behavior:

  • Same value + same label → same output
  • Works across different entities
  • Works across different field names
  • Most common mode for data consistency

When to use:

  • Maintaining referential integrity
  • When labels represent data types (configured during discovery)
  • Standard anonymization workflows
  • Recommended for most use cases

Example:

Customers.Name [person/name] "John" → "Mark"
Employees.FullName [person/name] "John" → "Mark" (same label)
Orders.CustomerName [person/name] "John" → "Mark" (same label)

Mode: Global

What it does: All transformations share a single project-wide dictionary.

Behavior:

  • Same value anywhere → same output
  • Maximum consistency
  • Single shared scope
  • Works across all entities, fields, and labels

When to use:

  • Maximum referential integrity
  • When you want identical values to always transform identically
  • Simple, global consistency requirements
  • When label detection is unreliable (check discovery settings)

Example:

Customers.Name "John" → "Mark"
Employees.Name "John" → "Mark" (global reuse)
Orders.Customer "John" → "Mark" (global reuse)
Invoices.Contact "John" → "Mark" (global reuse)

When and Why to Use Dictionaries

Use Dictionaries When:

  1. Maintaining Referential Integrity

    • You need the same person/company/identifier to map consistently across multiple tables (configured via schema)
    • Foreign key relationships must be preserved
    • Data relationships matter for testing or analytics
  2. Consistent Anonymization Across Jobs

    • You run jobs multiple times (using pipelines)
    • You want deterministic results
    • You need to compare results over time
  3. Cross-Database Consistency

    • Same data appears in multiple databases (configured via sinks)
    • You need consistent anonymization across all sources
    • Migrations between environments
  4. Realistic Test Data

    • Generated data needs to look realistic
    • Relationships must make sense
    • Consistency improves data quality
  5. Compliance and Auditing

    • Trackable anonymization patterns
    • Reproducible transformations
    • Audit trail of transformations

Don't Use Dictionaries When:

  1. Maximum Randomization Needed

    • Security testing
    • Privacy-critical scenarios
    • When uniqueness is more important than consistency
  2. One-Time Transformations

    • Single-use data exports
    • No future reuse needed
    • Disposable test environments
  3. Different Contexts Require Different Values

    • When "John" in Customer table should differ from "John" in Employee table
    • Context-dependent anonymization
    • Field-specific privacy requirements

Strategies for Using Dictionaries in Gigantics

Best for: Most standard anonymization workflows

Setup:

  1. Configure rule with Dictionary Mode: Label
  2. Ensure fields are properly labeled (person/name, email, phone, etc.)
  3. Enable "Store new transformations in the dictionary"
  4. Enable "Cache dictionary" for performance

Benefits:

  • Automatic consistency across related data types
  • Works across multiple entities
  • Maintains referential integrity
  • Easy to configure

Example Workflow:

1. Run discovery to label fields
2. Create rule with Label mode dictionary
3. Execute job - transformations stored by label
4. Future jobs automatically reuse stored transformations

Strategy 2: Progressive Dictionary Building

Best for: Iterative development and refinement

Setup:

  1. Start with "Store new transformations" enabled
  2. Run initial job with smaller dataset
  3. Review dictionary entries
  4. Export dictionary for backup
  5. Run full job - dictionary already contains partial entries

Benefits:

  • Build consistency over time
  • Test with smaller datasets first
  • Can refine and re-import dictionary
  • Incremental approach

Workflow:

1. Sample 1000 records → build initial dictionary
2. Export dictionary
3. Import to new job
4. Run full [dataset](/model/datasets) → partial values already consistent
5. New values added to existing dictionary

Strategy 3: Scope-Specific Dictionaries

Best for: Complex projects with different consistency requirements

Setup:

  1. Use User-defined scope mode for specific fields
  2. Define custom scopes (e.g., "customer-identifiers", "financial-data")
  3. Group related fields under same scope
  4. Different scopes maintain separate dictionaries

Benefits:

  • Fine-grained control
  • Different consistency rules per data type
  • Flexible grouping
  • Can export/import specific scopes

Example:

Scope: "customer-identifiers"
- Customer Name → "Mark Johnson"
- Billing Contact → "Mark Johnson" (same scope)

Scope: "employee-data"
- Employee Name → "Sarah Williams"
- Manager Name → "Sarah Williams" (same scope)

No cross-scope consistency

Strategy 4: Pipeline with Dictionary Reuse

Best for: Scheduled jobs and automation

Setup:

  1. Configure pipeline with dictionary settings
  2. Enable "Store new transformations"
  3. Disable "Overwrite existing dictionary"
  4. Schedule pipeline to run regularly

Benefits:

  • Dictionary grows over time
  • Consistency across scheduled runs
  • Automated consistency
  • Can export dictionary between runs

Workflow:

[Pipeline](/model/pipelines) Configuration:

- Dictionary Mode: Label
- Store new: ?
- Overwrite: ?
- Cache: ?

Scheduled Runs:

- Run 1: Processes 10K records, builds dictionary
- Run 2: Processes new 5K records, reuses existing + adds new
- Run 3: Maximum reuse, minimal new entries

Strategy 5: Dictionary Import/Export Workflow

Best for: Multi-environment deployment and migration

Setup:

  1. Develop dictionary in development environment
  2. Export dictionary after testing
  3. Import dictionary to staging/production
  4. Use same dictionary across environments

Benefits:

  • Consistent anonymization across environments
  • Can test dictionary before production
  • Reproducible deployments
  • Backup and restore capability

Workflow:

Development:
1. Build and test dictionary
2. Export dictionary.json

Staging:
3. Import dictionary.json
4. Verify consistency
5. Run tests

Production:
6. Import same dictionary.json
7. Execute job with pre-built dictionary
8. Export for backup

Strategy 6: Field-Level Overrides

Best for: Mixing consistency and randomness

Setup:

  1. Rule-level: Dictionary Mode: Label (default)
  2. Most fields: Inherit from rule
  3. Specific fields: Override with "Skip dictionary" or different mode

Benefits:

  • Default consistency for most fields
  • Specific control for sensitive fields
  • Flexible per-field configuration
  • Best of both worlds

Example:

[Rule](/model/rules) Default: Label mode

Fields:
- Customer Name: Inherit → Label mode (consistent)
- Email: Inherit → Label mode (consistent)
- SSN: Skip dictionary → Unique per row (random)
- Account Number: Entity/Field mode → Field-specific

Strategy 7: Null Handling Strategy

Best for: Datasets with many null values

Setup:

  1. Enable "Nulls handling" in dictionary options
  2. Nulls will be consistently transformed
  3. Useful for maintaining data patterns

Benefits:

  • Consistent null value anonymization
  • Preserves null patterns in data
  • Can transform nulls to consistent placeholder

Example:

Without null handling:
NULL → (varies: "", "N/A", "Unknown", null)

With null handling:
NULL → "Anonymous" (consistent)

Best Practices

1. Start with Label Mode

  • Most versatile and useful mode
  • Works automatically with discovery labels
  • Provides good balance of consistency and flexibility

2. Enable Caching for Performance

  • Cache dictionary option improves lookup speed
  • Especially important for large dictionaries
  • Uses memory but significantly faster

3. Store Transformations for Reuse

  • Enable "Store new transformations" unless you need one-time jobs
  • Builds dictionary over time
  • Enables consistency across job runs

4. Export Regularly

  • Export dictionary as backup
  • Export before major changes
  • Export for migration between environments

5. Use Appropriate Scope Granularity

  • Too broad (Global): May cause unintended consistency
  • Too narrow (Field): May miss relationships
  • Just right (Label): Balances consistency and flexibility

6. Monitor Dictionary Size

  • Large dictionaries may impact performance
  • Use "View Summary" to monitor by scope
  • Consider scope-specific exports if too large

7. Test Before Production

  • Build dictionary in development
  • Test with sample datasets
  • Export and import to staging (via sinks)
  • Verify consistency

8. Document Custom Scopes

  • Document user-defined scopes
  • Keep scope naming consistent
  • Document why certain fields use custom scopes

Common Use Cases

Use Case 1: Customer Database Anonymization

Scenario: Anonymize customer data while maintaining relationships

Configuration:

  • Dictionary Mode: Label
  • Store new transformations: Yes
  • Cache dictionary: Yes

Result:

Use Case 2: Multi-Database Consistency

Scenario: Same data in multiple databases, need consistent anonymization

Configuration:

  • Dictionary Mode: Global
  • Store new transformations: Yes
  • Export dictionary after first run
  • Import into subsequent database jobs

Result:

  • Identical anonymization across all databases
  • Can share dictionary between projects

Use Case 3: Incremental Data Processing

Scenario: Process new data periodically, maintain consistency with historical data

Configuration:

  • Dictionary Mode: Label
  • Store new transformations: Yes
  • Overwrite existing: No
  • Run pipeline on schedule

Result:

  • New data uses existing dictionary
  • New entries added to dictionary
  • Growing consistency over time

Use Case 4: Selective Consistency

Scenario: Some fields need consistency, others need randomness

Configuration:

  • Rule Default: Label mode
  • Specific fields: Skip dictionary or Field mode

Result:

  • Important fields: Consistent
  • Sensitive fields: Random
  • Flexible per-field control

Troubleshooting

Dictionary Not Working

Problem: Transformations are different each run

Solutions:

  1. Check dictionary mode is not "None"
  2. Verify "Store new transformations" is enabled
  3. Check if "Overwrite existing" is clearing dictionary
  4. Ensure cache is enabled for performance

Performance Issues

Problem: Job runs slowly with dictionary enabled

Solutions:

  1. Enable "Cache dictionary" option
  2. Check dictionary size - may need to clear old entries
  3. Consider scope-specific dictionaries
  4. Monitor with dictionary summary

Inconsistent Results

Problem: Same value transforming differently

Solutions:

  1. Check if using correct dictionary mode
  2. Verify labels are consistent across fields (check discovery results)
  3. Check if field-level overrides are set in rule configuration
  4. Review scope settings

Dictionary Too Large

Problem: Dictionary has too many entries

Solutions:

  1. Use "View Summary" to identify large scopes
  2. Export specific scopes only
  3. Clear dictionary and rebuild if needed
  4. Consider splitting into multiple scopes

Summary

Dictionaries are a powerful feature in Gigantics that enable:

  • Consistent anonymization across jobs and databases
  • Referential integrity preservation
  • Flexible configuration from global to field-level
  • Import/Export for backup and migration
  • Performance optimization through caching
  • Fine-grained control through modes and scopes

Start with Label mode for most scenarios, enable caching and storage for best results, and use export/import for backup and migration workflows.

On this page