Synthesize
The Synthesize operation allows you to generate new synthetic data records based on your existing dataset or using custom functions. This operation is particularly useful for creating larger datasets for testing purposes while maintaining realistic data characteristics.
Overview
The Synthesize operation enables you to:
- Generate new data records that maintain realistic characteristics
- Create datasets of specific sizes based on your requirements
- Apply different synthesis techniques to different fields
- Use fake data generation, custom functions, or saved functions
- Work with various data types (strings, numbers, dates, etc.)
- Choose between appending synthesized data to existing data or replacing it entirely
Configuration
Entity-Level Synthesis
The Synthesize operation is configured at the entity level, meaning you can apply different synthesis parameters to each entity in your dataset. For each entity, you can:
- Customize synthesis behavior for specific fields
- Set the desired output size for that entity
- Enable or disable synthesis for the entity
Auto Synth Mode
For quick configuration, you can use the Auto Synth feature which applies the same synthesis parameters to all entities:
Expected Output Data Size
Select the size of your new synthesized dataset based on the source entity size:
- Same size as source entity: Generate exactly the same number of rows for each entity as exists in the source data
- Proportional: Generate a different size using a percentage multiplier. You can also set minimum and maximum row limits to ensure reasonable dataset sizes
Behavior Options
Set how the synthesized data interacts with your existing data:
- Append to source data: Keeps the existing rows and adds the synthesized rows to the end of each entity. Note that appended data will not be anonymized, so your dataset may contain sensitive data.
- Replace source data: Removes existing rows completely and inserts only the newly synthesized rows
Field-Level Synthesis
For detailed control, you can customize how each field in your entities is synthesized:
Synthesis Methods
Data can be synthesized in several ways:
- Fake data + label: Generate realistic fake data based on the field's assigned label (e.g., names, emails, addresses)
- Functions: Use built-in transformation functions. See list of functions
- Saved functions: Apply a function previously created in the configuration items section
- Custom function: Write your own JavaScript function to generate values for the field. See how to create a custom function
- List: Select random values from a predefined list
- Sequential numbers: Generate sequential numeric values
- Random numbers: Generate random numeric values within specified ranges
- No action: Keep the original values unchanged (default for newly added fields)
Field Configuration Options
When configuring field synthesis, you can specify:
- For numeric fields: Range constraints, sequential vs. random generation
- For labeled fields: Locale preferences for fake data generation
- For custom functions: JavaScript code that takes the original value and returns a synthesized version
- For list-based fields: The data list to select from
Examples
Basic Synthesis with Auto Synth
To generate a dataset that is 150% the size of your source data:
- Add the Synthesize operation to your rule
- Select "Proportional" for expected output data size
- Set percentage to 150
- Choose "Replace source data" behavior
- Apply to all entities or configure per entity
Field-Level Synthesis Customization
To customize synthesis for specific fields in a customer entity:
- Select the customer entity in the synthesize configuration
- For the "name" field, choose "Fake data + label" with "name" label
- For the "email" field, choose "Fake data + label" with "email" label
- For the "id" field, choose "Sequential numbers" to generate unique IDs
- For the "age" field, choose "Random numbers" with min=18 and max=80
- For the "phone" field, choose a custom function to maintain format consistency
Using Custom Functions
To apply a custom synthesis algorithm:
- Select a field and choose "Custom function"
- Write JavaScript code that generates appropriate synthetic values
- Use built-in helpers like
chance(),faker, orgenLike()for realistic data generation
Example:
This operation provides powerful data generation capabilities that can help you create realistic test datasets while ensuring privacy compliance when combined with anonymization operations.