Dictionaries

Dictionaries allow us to reuse values from previous transformations, not only between different jobs, but also in the same execution. This allows us to keep consistency in the resulting dataset, a given input will always produce the same output value.

Dictionaries are common to all taps within the same project. Therefore, we will be able to transform data consistently across multiple databases.

When running or editing a rule, the dictionary usage options will show up in the configuration right panel.

How to use

Depending on the criteria we choose, we will obtain different results depending on the dictionary configuration we set. These are the available ones:

Do not use dictionary

Ignore value matches in the transformation job. Completely different values will result even if the input value is identical.

Real data

id	Name	Last name
1	Edith	Upton
2	Keith	Smith
3	Edith	Smith

Anonymized data

id	Name	Last name
1	Glenda	Leannon
2	Felix	Reynolds
3	Valerie	Block

As we can see, although Edith was previously anonymized with Glenda, the second time it appears it becomes another value (Valerie). The same happens with the last name Smith.

Reuse in the same entity and field

Matches in the same entity and field will be transformed in the same way. Even if there are matches in other entities or fields, they will be ignored.

Real data

Entity: Customers

id	Name	Last name
1	Danielle	Upton
2	Jay	Smith
3	Danielle	Herman
4	Dwayne	Smith

Anonymized data

Entity: Customers

id	Name	Last name
1	Melanie	Spencer
2	Ted	Huxley
3	Melanie	Armstrong
4	Leonard	Huxley

In this case, the name Danielle becomes Melanie, both the first and the second time it appears.

This happens because after the first clash, the value is stored in the dictionary, so when the entity and the field match again, the value is transformed in the same way. The same happens with the surname Smith, which in both occurrences is found in the entity Customers and in the Last name field.

Reuse by label or in the same entity and fields

Matches on fields labelled with the same label will be transformed in the same way. If they do not have the same label they may still match on the combination of entity and field (as in the previous case) and in which case the result would be identical.

Even if there are matches in other entities or columns they will be ignored if they do not have the same label.

Real data

Entity: Customers

id	Name `person/name`
1	Randal
2	Alma

Entity: Employees

id	Employee `person/name`
1	Randal
2	Ronnie

Anonymized data

Entity: Customers

id	Name `person/name`
1	Mark
2	Katherine

Entity: Employees

id	Employee `person/name`
1	Mark
2	Jeremy

In this case, Randal is transformed into Mark in both tables because even though occurrences happen in different entity and field, they both share the same label "person/name".

Reuse in all fields

Values that have already been stored in the dictionary will be used regardless of the entity and the field in which they have been found or the label assigned to it.

Real data

Entity: Customers

id	Name	Last name
1	Susan	Heaney
2	Bertha	Susan
3	Susan	Keeling

Entity: Employees

id	Employer_name	Employer_last_name
1	Janet	Rogahn
2	Marianne	McGlynn
3	Susan	Bauch

Anonymized data

Entity: Customers

id	Name	Last name
1	Percy	Rodriguez
2	Jesse	Percy
3	Percy	Leffler

Entity: Employees

id	Employer_name	Employer_last_name
1	Whitney	Kautzer
2	Garry	Dare
3	Percy	Mills

Although they do not share the same entity, field nor label, all occurrences of Susan will always become Percy, no matter where they are found along the datasource.

Save new transformations in the dictionary

If this option is enabled, transformations are stored and can be used in the next jobs.

If this option is not active, transformations carried out during the job will be deleted, so that they will only take effect during the job itself.

Overwrite current dictionary

If this option is active, the current project dictionary will be cleared before the rule is run, so no previously stored values will be reused.

Gigantic does not store any source data in its database. We use a cryptographic function to hash the entries. Therefore, it is impossible to revert the process to get the original data back.

On this page