Dictionaries
Dictionaries allow us to reuse values from previous transformations, not only between different jobs, but also in the same execution. This allows us to keep consistency in the resulting dataset, a given input will always produce the same output value.
Dictionaries are common to all taps within the same project. Therefore, we will be able to transform data consistently across multiple databases.
When running or editing a rule, the dictionary usage options will show up in the configuration right panel.
How to use
Depending on the criteria we choose, we will obtain different results depending on the dictionary configuration we set. These are the available ones:
Do not use dictionary
Ignore value matches in the transformation job. Completely different values will result even if the input value is identical.
Real data
id | Name | Last name |
---|---|---|
1 | Edith | Upton |
2 | Keith | Smith |
3 | Edith | Smith |
Anonymized data
id | Name | Last name |
---|---|---|
1 | Glenda | Leannon |
2 | Felix | Reynolds |
3 | Valerie | Block |
As we can see, although Edith
was previously anonymized with Glenda
, the
second time it appears it becomes another value (Valerie
). The same happens
with the last name Smith
.
Reuse in the same entity and field
Matches in the same entity and field will be transformed in the same way. Even if there are matches in other entities or fields, they will be ignored.
Real data
Entity: Customers
id | Name | Last name |
---|---|---|
1 | Danielle | Upton |
2 | Jay | Smith |
3 | Danielle | Herman |
4 | Dwayne | Smith |
Anonymized data
Entity: Customers
id | Name | Last name |
---|---|---|
1 | Melanie | Spencer |
2 | Ted | Huxley |
3 | Melanie | Armstrong |
4 | Leonard | Huxley |
In this case, the name Danielle
becomes Melanie
, both the first and the
second time it appears.
This happens because after the first clash, the value is stored in the
dictionary, so when the entity and the field match again, the value is
transformed in the same way. The same happens with the surname Smith
, which in
both occurrences is found in the entity Customers and in the Last name
field.
Reuse by label or in the same entity and fields
Matches on fields labelled with the same label will be transformed in the same way. If they do not have the same label they may still match on the combination of entity and field (as in the previous case) and in which case the result would be identical.
Even if there are matches in other entities or columns they will be ignored if they do not have the same label.
Real data
Entity: Customers
id | Name person/name |
---|---|
1 | Randal |
2 | Alma |
Entity: Employees
id | Employee person/name |
---|---|
1 | Randal |
2 | Ronnie |
Anonymized data
Entity: Customers
id | Name person/name |
---|---|
1 | Mark |
2 | Katherine |
Entity: Employees
id | Employee person/name |
---|---|
1 | Mark |
2 | Jeremy |
In this case, Randal
is transformed into Mark
in both tables because even
though occurrences happen in different entity and field, they both share the
same label "person/name".
Reuse in all fields
Values that have already been stored in the dictionary will be used regardless of the entity and the field in which they have been found or the label assigned to it.
Real data
Entity: Customers
id | Name | Last name |
---|---|---|
1 | Susan | Heaney |
2 | Bertha | Susan |
3 | Susan | Keeling |
Entity: Employees
id | Employer_name | Employer_last_name |
---|---|---|
1 | Janet | Rogahn |
2 | Marianne | McGlynn |
3 | Susan | Bauch |
Anonymized data
Entity: Customers
id | Name | Last name |
---|---|---|
1 | Percy | Rodriguez |
2 | Jesse | Percy |
3 | Percy | Leffler |
Entity: Employees
id | Employer_name | Employer_last_name |
---|---|---|
1 | Whitney | Kautzer |
2 | Garry | Dare |
3 | Percy | Mills |
Although they do not share the same entity, field nor label, all occurrences of
Susan
will always become Percy
, no matter where they are found along the
datasource.
Save new transformations in the dictionary
If this option is enabled, transformations are stored and can be used in the next jobs.
If this option is not active, transformations carried out during the job will be deleted, so that they will only take effect during the job itself.
Overwrite current dictionary
If this option is active, the current project dictionary will be cleared before the rule is run, so no previously stored values will be reused.
Gigantic does not store any source data in its database. We use a cryptographic function to hash the entries. Therefore, it is impossible to revert the process to get the original data back.