Dictionary

What is a Dictionary?

A Dictionary in Gigantics is a smart storage system that remembers how you've transformed data values. Think of it as a memory system that ensures consistent anonymization across your entire project.

Why Dictionaries Matter

When you anonymize data, you often need the same original value to become the same anonymized value every time it appears. This is called referential integrity - maintaining relationships between data even after anonymization.

Example Scenario:

Original Data:
- Customer "John Smith" with email "john@example.com"
- Order placed by "John Smith"
- Invoice sent to "john@example.com"

Without Dictionary:
- Customer name - "Mark Johnson"
- Order customer - "Sarah Williams"  - Different!
- Invoice email - "sarah@email.com"  - Different!

With Dictionary (Label Mode):
- Customer name - "Mark Johnson"
- Order customer - "Mark Johnson"     - Consistent!
- Invoice email - "mark@email.com"   - Consistent!

Dictionaries ensure that "John Smith" always becomes "Mark Johnson" and "john@example.com" always becomes "mark@email.com" throughout your entire project, maintaining data relationships and consistency.

Where Dictionaries Are Used in Gigantics

Dictionaries are integrated throughout the Gigantics application in several key areas:

1. Project Dictionary Page

The main dictionary management interface is located at:

Navigation Path:

Projects → [Your Project] → Dictionary

This is where you can:

View all dictionary entries
Import/Export dictionaries
Search and filter entries
Clear the entire dictionary
View summary statistics by scope

2. Rule Configuration

Dictionaries are configured when creating or editing Rules and Pipelines:

Navigation Path:

Model → Rules → [Create/Edit Rule] → Default Options → Dictionary

Here you configure:

Dictionary mode (Field, Label, Global, None)
Cache dictionary option
Store new transformations
Overwrite existing dictionary

3. Field-Level Transformations

When configuring individual field transformations in anonymize operations:

Navigation Path:

Model → Rules → [Edit Rule] → [Select Field] → Transform Options → Dictionary

Each field can have its own dictionary settings:

Dictionary mode override
Replace label option
Custom scope definition
With options toggle
Nulls handling

4. Pipeline Configuration

When setting up automated pipelines:

Navigation Path:

Model → Pipelines → [Create/Edit Pipeline] → Dictionary Options

Pipelines inherit dictionary settings that will be used for all job executions.

Dictionary UI Components

Dictionary Main Page

The dictionary page (/projects/dictionary) provides a comprehensive interface for managing dictionary entries: for managing dictionary entries:

┌────────────────────────────────────────────────────────────────────────────┐
│ Dictionary                                                        [Search] │
│ Dictionary entries of this project                              [Sort: ▼]  │
├────────────────────────────────────────────────────────────────────────────┤
│ [View Summary] [Export] [Import] [Clear Dictionary]                        │
├────────────────────────────────────────────────────────────────────────────┤
│ Key                  New Value              Scope                          │
│ ───────────────────────────────────────────────────────────────────────────│
│ abc123def456...      Mark Johnson           person/name                    │
│ def789ghi012...      mark@email.com         email                          │
│ jkl345mno678...      Company XYZ             org/name                      │
│ ...                  ...                     ...                           │
├────────────────────────────────────────────────────────────────────────────┤
│          [« Prev] [Next »]                                                 │
└────────────────────────────────────────────────────────────────────────────┘

Toolbar Actions:

Button	Icon	Function	When to Use
View Summary	Eye	Shows count of entries by scope	To get overview of dictionary structure
Export	Export	Downloads dictionary as CSV or JSON	To backup or migrate dictionary
Import	Import	Uploads dictionary from file	To restore or merge dictionaries
Clear Dictionary	Clear	Removes all entries	To start fresh or reset dictionary

Search Functionality:

Search by key (MD5 hash)
Search by scope name
Search by transformed value
Real-time filtering as you type

Sorting Options:

Sort by Scope (default)
Sort by Key
Sort by Value

┌────────────────────────────────────────┐
│ Import dictionary                      │
├────────────────────────────────────────┤
│                                        │
│    ┌────────────────────────────┐      │
│    │                            │      │
│    │    [Drag & Drop Area]      │      │
│    │                            │      │
│    │    ⇧ Click or drag file    │      │
│    │        Accept: .json       │      │
│    │                            │      │
│    └────────────────────────────┘      │
│                                        │
│  Action:                               │
│  ☑ Append new / replace matching       │
│  ☐ Overwrite entire dictionary         │
│                                        │
│              [Cancel]  [Confirm]       │
└────────────────────────────────────────┘

Import Options:

Option	Description	Use Case
Append	Adds new entries, updates matching keys	Merging dictionaries or updating specific entries
Overwrite	Replaces entire dictionary	Restoring from backup or complete replacement

File Format:

JSON format required
Each entry must have: key, val, scope

┌────────────────────────────────────────┐
│ Export dictionary                      │
├────────────────────────────────────────┤
│                                        │
│  Format:                               │
│  [● CSV ]                              │
│                                        │
│  What would you like to export?        │
│  ☑ Full dictionary                     │
│  ☐ Select scope                        │
│    [Select scope ▼]                    │
│      - person/name                     │
│      - email                           │
│      - org/name                        │
│      ...                               │
│                                        │
│              [Cancel]  [Export]        │
└────────────────────────────────────────┘

Export Options:

Option	Description	When to Use
Full Dictionary	Exports all entries	Complete backup or migration
Select Scope	Exports specific scopes	Partial backup or scope-specific analysis
CSV Format	Comma-separated values	Spreadsheet analysis or external tools
JSON Format	JSON structure	Programmatic use or re-import

Shows a breakdown of dictionary entries by scope:

┌────────────────────────────────────────┐
│ Dictionary Summary                     │
├────────────────────────────────────────┤
│ Scope              Count               │
├────────────────────────────────────────┤
│ person/name        15,234              │
│ email              12,456              │
│ org/name            8,901              │
│ phone              5,678               │
│ address            3,421               │
│ ...                 ...                │
└────────────────────────────────────────┘

Rule Configuration - Dictionary Options

When configuring rules, you'll see the Dictionary section:

┌─────────────────────────────────────────────────────────┐
│ Dictionary                                              │
│ ℹ Indicates whether this rule will                      │
│   make use of values that have                          │
│   been masked in previous executions                    │
│                                                         │
│ Mode:                                                   │
│ ○ No dictionary                                         │
│ ○ Reuse values on the same entity+field                 │
│ ○ Reuse values with the same label or same entity+field │
│ ○ Reuse values in every field                           │
│                                                         │
│ ○ Cache dictionary                                      │
│ ○ Store new transformations in the dictionary           │
│ ○ Overwrite existing dictionary                         │
└─────────────────────────────────────────────────────────┘

Configuration Options:

Option	Description	Impact
Mode: None	Disables dictionary usage	Maximum randomness, no consistency
Mode: Field	Reuse per entity+field combination	Different values in different fields
Mode: Label	Reuse per label type	Consistent across same data types
Mode: Global	Reuse everywhere	Maximum consistency, single scope
Cache dictionary	Store in memory for faster access	Better performance, uses more memory
Store new transformations	Save new transformations for future use	Dictionary grows, enables reuse across jobs
Overwrite existing	Clear dictionary before job starts	Fresh start, removes old entries

Field-Level Dictionary Configuration

When configuring individual field transformations:

┌────────────────────────────────────────┐
│ Dictionary                             │
│                                        │
│ Mode: [▼ Label scope]                  │
│   □ Inherit from rule                  │
│   □ Skip dictionary                    │
│   □ Label scope                        │
│   □ Fieldname scope                    │
│   □ Entity/Field scope                 │
│   □ Global scope                       │
│   □ User-defined scope                 │
│                                        │
│ Replace Label: [_________________]     │
│ Scope:         [_________________]     │
│                                        │
│ □ With options                         │
│ □ Nulls handling                       │
└────────────────────────────────────────┘

Field-Level Options:

Option	Description	Example Use Case
Inherit from rule	Uses rule-level dictionary settings	Default behavior, consistent with rule
Skip dictionary	Bypasses dictionary for this field	Maximum randomness for sensitive fields
Label scope	Uses field's label for scoping	Standard consistency within data type
Fieldname scope	Uses field name across entities	Consistent for fields with same name
Entity/Field scope	Field-specific scope	Different values per field
Global scope	Project-wide consistency	Maximum consistency
User-defined scope	Custom scope name	Custom grouping logic
Replace Label	Override automatic label detection	Treat field as different type
Scope	Custom scope identifier	Custom grouping when using user-defined
With options	Include function options in key	Different transformations for same value with different params
Nulls handling	Store and reuse null transformations	Consistent null value handling

Dictionary Modes Explained

Mode: None (Disabled)

What it does: Dictionary is completely disabled for this rule or field.

Behavior:

No transformations are stored
No lookups are performed
Each transformation is independent
Maximum randomness

When to use:

When you want maximum randomization
For one-time transformations
When consistency is not required
Testing or exploration scenarios

Example:

Input: "John Smith"
Run 1: "Mark Johnson"
Run 2: "Sarah Williams"
Run 3: "Robert Davis"

Mode: Field (Entity + Field)

What it does: Reuses transformations within the same entity and field combination only.

Behavior:

Same value in same field → same output
Same value in different field → different output
Same value in different entity → different output

When to use:

When fields should have independent transformations
When same value means different things in different fields
Testing field-specific anonymization

Example:

Customers.Name "John" → "Mark"
Customers.Name "John" → "Mark" (reused)
Orders.Name "John" → "Sarah" (different field)

Mode: Label (Default)

What it does: Reuses transformations for fields with the same label, regardless of entity or field name.

Behavior:

Same value + same label → same output
Works across different entities
Works across different field names
Most common mode for data consistency

When to use:

Maintaining referential integrity
When labels represent data types (configured during discovery)
Standard anonymization workflows
Recommended for most use cases

Example:

Customers.Name [person/name] "John" → "Mark"
Employees.FullName [person/name] "John" → "Mark" (same label)
Orders.CustomerName [person/name] "John" → "Mark" (same label)

Mode: Global

What it does: All transformations share a single project-wide dictionary.

Behavior:

Same value anywhere → same output
Maximum consistency
Single shared scope
Works across all entities, fields, and labels

When to use:

Maximum referential integrity
When you want identical values to always transform identically
Simple, global consistency requirements
When label detection is unreliable (check discovery settings)

Example:

Customers.Name "John" → "Mark"
Employees.Name "John" → "Mark" (global reuse)
Orders.Customer "John" → "Mark" (global reuse)
Invoices.Contact "John" → "Mark" (global reuse)

When and Why to Use Dictionaries

Use Dictionaries When:

Maintaining Referential Integrity
- You need the same person/company/identifier to map consistently across multiple tables (configured via schema)
- Foreign key relationships must be preserved
- Data relationships matter for testing or analytics
Consistent Anonymization Across Jobs
- You run jobs multiple times (using pipelines)
- You want deterministic results
- You need to compare results over time
Cross-Database Consistency
- Same data appears in multiple databases (configured via sinks)
- You need consistent anonymization across all sources
- Migrations between environments
Realistic Test Data
- Generated data needs to look realistic
- Relationships must make sense
- Consistency improves data quality
Compliance and Auditing
- Trackable anonymization patterns
- Reproducible transformations
- Audit trail of transformations

Don't Use Dictionaries When:

Maximum Randomization Needed
- Security testing
- Privacy-critical scenarios
- When uniqueness is more important than consistency
One-Time Transformations
- Single-use data exports
- No future reuse needed
- Disposable test environments
Different Contexts Require Different Values
- When "John" in Customer table should differ from "John" in Employee table
- Context-dependent anonymization
- Field-specific privacy requirements

Strategies for Using Dictionaries in Gigantics

Strategy 1: Label-Based Consistency (Recommended)

Best for: Most standard anonymization workflows

Setup:

Configure rule with Dictionary Mode: Label
Ensure fields are properly labeled (person/name, email, phone, etc.)
Enable "Store new transformations in the dictionary"
Enable "Cache dictionary" for performance

Benefits:

Automatic consistency across related data types
Works across multiple entities
Maintains referential integrity
Easy to configure

Example Workflow:

1. Run discovery to label fields
2. Create rule with Label mode dictionary
3. Execute job - transformations stored by label
4. Future jobs automatically reuse stored transformations

Strategy 2: Progressive Dictionary Building

Best for: Iterative development and refinement

Setup:

Start with "Store new transformations" enabled
Run initial job with smaller dataset
Review dictionary entries
Export dictionary for backup
Run full job - dictionary already contains partial entries

Benefits:

Build consistency over time
Test with smaller datasets first
Can refine and re-import dictionary
Incremental approach

Workflow:

1. Sample 1000 records → build initial dictionary
2. Export dictionary
3. Import to new job
4. Run full [dataset](/model/datasets) → partial values already consistent
5. New values added to existing dictionary

Strategy 3: Scope-Specific Dictionaries

Best for: Complex projects with different consistency requirements

Setup:

Use User-defined scope mode for specific fields
Define custom scopes (e.g., "customer-identifiers", "financial-data")
Group related fields under same scope
Different scopes maintain separate dictionaries

Benefits:

Fine-grained control
Different consistency rules per data type
Flexible grouping
Can export/import specific scopes

Example:

Scope: "customer-identifiers"
- Customer Name → "Mark Johnson"
- Billing Contact → "Mark Johnson" (same scope)

Scope: "employee-data"
- Employee Name → "Sarah Williams"
- Manager Name → "Sarah Williams" (same scope)

No cross-scope consistency

Strategy 4: Pipeline with Dictionary Reuse

Best for: Scheduled jobs and automation

Setup:

Configure pipeline with dictionary settings
Enable "Store new transformations"
Disable "Overwrite existing dictionary"
Schedule pipeline to run regularly

Benefits:

Dictionary grows over time
Consistency across scheduled runs
Automated consistency
Can export dictionary between runs

Workflow:

[Pipeline](/model/pipelines) Configuration:

- Dictionary Mode: Label
- Store new: ?
- Overwrite: ?
- Cache: ?

Scheduled Runs:

- Run 1: Processes 10K records, builds dictionary
- Run 2: Processes new 5K records, reuses existing + adds new
- Run 3: Maximum reuse, minimal new entries

Strategy 5: Dictionary Import/Export Workflow

Best for: Multi-environment deployment and migration

Setup:

Develop dictionary in development environment
Export dictionary after testing
Import dictionary to staging/production
Use same dictionary across environments

Benefits:

Consistent anonymization across environments
Can test dictionary before production
Reproducible deployments
Backup and restore capability

Workflow:

Development:
1. Build and test dictionary
2. Export dictionary.json

Staging:
3. Import dictionary.json
4. Verify consistency
5. Run tests

Production:
6. Import same dictionary.json
7. Execute job with pre-built dictionary
8. Export for backup

Strategy 6: Field-Level Overrides

Best for: Mixing consistency and randomness

Setup:

Rule-level: Dictionary Mode: Label (default)
Most fields: Inherit from rule
Specific fields: Override with "Skip dictionary" or different mode

Benefits:

Default consistency for most fields
Specific control for sensitive fields
Flexible per-field configuration
Best of both worlds

Example:

[Rule](/model/rules) Default: Label mode

Fields:
- Customer Name: Inherit → Label mode (consistent)
- Email: Inherit → Label mode (consistent)
- SSN: Skip dictionary → Unique per row (random)
- Account Number: Entity/Field mode → Field-specific

Strategy 7: Null Handling Strategy

Best for: Datasets with many null values

Setup:

Enable "Nulls handling" in dictionary options
Nulls will be consistently transformed
Useful for maintaining data patterns

Benefits:

Consistent null value anonymization
Preserves null patterns in data
Can transform nulls to consistent placeholder

Example:

Without null handling:
NULL → (varies: "", "N/A", "Unknown", null)

With null handling:
NULL → "Anonymous" (consistent)

Best Practices

1. Start with Label Mode

Most versatile and useful mode
Works automatically with discovery labels
Provides good balance of consistency and flexibility

2. Enable Caching for Performance

Cache dictionary option improves lookup speed
Especially important for large dictionaries
Uses memory but significantly faster

3. Store Transformations for Reuse

Enable "Store new transformations" unless you need one-time jobs
Builds dictionary over time
Enables consistency across job runs

4. Export Regularly

Export dictionary as backup
Export before major changes
Export for migration between environments

5. Use Appropriate Scope Granularity

Too broad (Global): May cause unintended consistency
Too narrow (Field): May miss relationships
Just right (Label): Balances consistency and flexibility

6. Monitor Dictionary Size

Large dictionaries may impact performance
Use "View Summary" to monitor by scope
Consider scope-specific exports if too large

7. Test Before Production

Build dictionary in development
Test with sample datasets
Export and import to staging (via sinks)
Verify consistency

8. Document Custom Scopes

Document user-defined scopes
Keep scope naming consistent
Document why certain fields use custom scopes

Common Use Cases

Use Case 1: Customer Database Anonymization

Scenario: Anonymize customer data while maintaining relationships

Configuration:

Dictionary Mode: Label
Store new transformations: Yes
Cache dictionary: Yes

Result:

Customer "John Smith" → "Mark Johnson" everywhere
Email "john@example.com" → "mark@email.com" everywhere
Relationships preserved across tables

Use Case 2: Multi-Database Consistency

Scenario: Same data in multiple databases, need consistent anonymization

Configuration:

Dictionary Mode: Global
Store new transformations: Yes
Export dictionary after first run
Import into subsequent database jobs

Result:

Identical anonymization across all databases
Can share dictionary between projects

Use Case 3: Incremental Data Processing

Scenario: Process new data periodically, maintain consistency with historical data

Configuration:

Dictionary Mode: Label
Store new transformations: Yes
Overwrite existing: No
Run pipeline on schedule

Result:

New data uses existing dictionary
New entries added to dictionary
Growing consistency over time

Use Case 4: Selective Consistency

Scenario: Some fields need consistency, others need randomness

Configuration:

Rule Default: Label mode
Specific fields: Skip dictionary or Field mode

Result:

Important fields: Consistent
Sensitive fields: Random
Flexible per-field control

Troubleshooting

Dictionary Not Working

Problem: Transformations are different each run

Solutions:

Check dictionary mode is not "None"
Verify "Store new transformations" is enabled
Check if "Overwrite existing" is clearing dictionary
Ensure cache is enabled for performance

Performance Issues

Problem: Job runs slowly with dictionary enabled

Solutions:

Enable "Cache dictionary" option
Check dictionary size - may need to clear old entries
Consider scope-specific dictionaries
Monitor with dictionary summary

Inconsistent Results

Problem: Same value transforming differently

Solutions:

Check if using correct dictionary mode
Verify labels are consistent across fields (check discovery results)
Check if field-level overrides are set in rule configuration
Review scope settings

Dictionary Too Large

Problem: Dictionary has too many entries

Solutions:

Use "View Summary" to identify large scopes
Export specific scopes only
Clear dictionary and rebuild if needed
Consider splitting into multiple scopes

Summary

Dictionaries are a powerful feature in Gigantics that enable:

Consistent anonymization across jobs and databases
Referential integrity preservation
Flexible configuration from global to field-level
Import/Export for backup and migration
Performance optimization through caching
Fine-grained control through modes and scopes

Start with Label mode for most scenarios, enable caching and storage for best results, and use export/import for backup and migration workflows.

On this page