Rules

What is a Rule

A rule is a set of operations that outputs anonymized or synthesized data into new datasets, the same or different datasources (sinks).

A rule is comparable to a program or an algorithm.

Rules run in jobs.

How create a new rule

To create a new rule, simply access the Rules area in the tap model.

Options

The options available for the creation of a rule are:

Name: Name of the rule.
Description: Field for a brief description of the rule, what is its purpose or why it has been created.
Load Options: Describes how the rule will operate when executed and the result will be dumped into a sink where a database already exists.
- Truncate: The system will remove the data from the database but will keep the DDL.
- Drop: Deletes both the data and the existing DDL.
- Append: Append the result of the rule to the target database.
Determinism: Indicates the randomness of the Anonymize operation if it is included in the rule.
- Random: Each time the rule is executed the anonymized results will vary.
- Deterministic: Given an input value, this value will always be masked by the same output value.
  - Seed: Indicates the base value to initialize the randomness of the output data.
Check foreign keys: Check the FK and relations between tables to create the new tables in a consistent way keeping the relations between entities. Gigantics will track the complete database and anonymize the data preserving the indexes and PK/FKs.

If enabled, and depending on how complex the database is, this option can increase the job's running time as gigantics stores the data in cache in order to search for master tables.

Read chunk size: Sets the size of the data block to be read during the rule execution.
Stream content: Selects the type of content to show in the job log.
Dictionary: It allows to reuse previous transformations. For more information check the article on how to use dictionaries.

Actions

The user can take the following actions on a rule:

Run: Run the rule. You can edit these execution parameters:
- Create a dataset: Dumps the output into a dataset.
- Load into sink: Load the rule execution directly into a sink.
- Schedule: Select when you want the rule to be executed.
Edit rule: Edit the rule options described in the above section Options.
View operations: Go to edit operations page.
Delete: Deletes the rule from the model.

Check your Pipelines before deleting a rule because the pipeline may become inconsistent if the rule is being used.

Operations list

After creating the rule, click on the name to enter into operations page. This list is divided into three areas according to the available operations.

Query

They are operations that modify the reading of data from the tap as if it were a SQL query. These operations are:

Include/Exclude: Select the entities you want to include or exclude to your dataset.
- In SQL language we use the keyword SELECT: SELECT ... FROM [...]
Where: Add conditionals in your entities to add or exclude specific fields.
- In SQL language we use the keyword WHERE keyword [...] WHERE employees.hire_date > '01/01/2000'.
Limit: Limits the number of rows to read (and consequently write) from the tap. You can limit all entities or apply limits per entity.
- In SQL language we use the keyword LIMIT keyword: [...] LIMIT BY 10.
Order: Sort the tables reading ASC or DESC by specific entity field.
- In SQL language we use the keyword ORDER BY keyword: [...] ORDER BY CustomerName ASC;.
Rate: Sets the reading speed of the data source.
Query: Allows the user to create their own customized query.

Transform

Operations that modify the data of the database. There are currently two types of transformations:

Anonymize: Masks the data using the available functions.
Synthesize: Generates synthetic data based on real data.

These transformations output anonymized or synthesized data into new datasets, the same or different datasources (sinks).

Output

Operations that modify the data output.

Max: Sets the maximum number of records that will be written.

On this page