Release Notes

Frequently Asked Questions (FAQ)

This FAQ covers common questions and answers about using Gigantics for database risk analysis, PII identification, data anonymization, and synthetic data generation.

Gigantics is a comprehensive database risk analysis tool that helps you identify Personally Identifiable Information (PII) elements in your databases. It allows you to analyze database schemas, generate security reports, manage datasets, and deploy databases to multiple environments in a secure manner. With AI-powered labeling, Gigantics helps you mask sensitive data and generate synthetic data for testing purposes.

What are the main capabilities of Gigantics?

Gigantics offers several key capabilities:

✅ Analyze your database schema and compare it with previous versions
✅ Identify PII elements and check the risk of each field
✅ Generate security reports on the current state of your databases
✅ Manage, share, and download your datasets
✅ Dump your datasets into other databases
✅ Deploy databases to multiple environments in a simple, effective, and secure way

What database systems does Gigantics support?

Based on the documentation and source code analysis, Gigantics supports multiple database systems including:

Oracle
DB2 (including DB2i and DB2z)
MySQL
PostgreSQL
SQL Server (MSSQL)
MongoDB
SQLite
CSV files

Does Gigantics provide role-based access control?

Yes, Gigantics includes a system of roles and permissions that fits the organizational structure of any company. This allows you to control who has access to which projects and data within your organization.

What is an Organization in Gigantics?

An Organization is your own space in Gigantics that contains Projects. Each user has their own organization, and these projects can be shared with other users through project configuration features. Users can also create additional organizations that may contain one or more projects.

What is a Project in Gigantics?

A Project is the user's workspace in Gigantics. From here, users can create models, work on databases, and invite users from their organization to join the project.

What is a Model in Gigantics?

A Model in Gigantics is a representation of your database schema and data processing rules. It allows you to define how data should be transformed, anonymized, or synthesized when creating datasets.

How does Gigantics identify PII elements?

Gigantics uses an AI-powered discovery process to automatically identify PII elements in your databases. The discovery feature analyzes field names, data patterns, and other characteristics to assign labels to fields, which helps in identifying sensitive data that needs protection.

Can I customize the labels used for PII identification?

Yes, Gigantics allows you to create custom labels for your specific data identification needs. This is useful when the default labels don't cover all your sensitive data types or when you have specific business requirements.

What is the difference between anonymizing and synthesizing data?

Anonymizing replaces original values with anonymized ones to protect sensitive data while maintaining the same dataset structure. Synthesizing generates new synthetic data records based on your existing dataset or using custom functions, potentially creating larger datasets for testing.

How do I analyze my database schema with Gigantics?

Gigantics provides schema analysis tools that allow you to examine your database structure. You can compare your current schema with previous versions to understand changes and potential risks.

How do I generate security reports with Gigantics?

Gigantics can generate comprehensive security reports on the current state of your databases. These reports provide insights into data risks, PII identification results, and overall database security posture.

What are Datasets in Gigantics?

Datasets are generated when you run a rule. They can be subsets (using operations like limit or include/exclude) or full datasets. Datasets can be downloaded in JSON or CSV formats or loaded into sinks.

The Share button allows you to create public URLs to share your datasets. You can create an API endpoint that others can access, with customizable formats (JSON ZIP, CSV ZIP, SQL).

What are Sinks in Gigantics?

Sinks are destination connections for data output. They must be added to your model and must match the driver type of your source database. Sinks allow you to load datasets into target databases.

What are Pipelines in Gigantics?

Pipelines in Gigantics are templates or blueprints that allow you to execute jobs periodically or using public links. They support various job types including scan, discover, create dataset, load dataset, and pump tap operations.

How do I schedule automatic execution of Pipelines?

You can configure pipelines to run automatically every X time as determined by you. Alternatively, you can set them for manual execution using the "Run" button or by calling a URL.

How do I run Gigantics as a daemon?

You can run Gigantics as a daemon in the system by adding the -d parameter when starting the server. For example: ./gig start -d

How do I specify a configuration file when starting Gigantics?

You can initialize your Gigantics instance using a different environment by using the -c parameter to specify the configuration file to be used. For example: ./gig start -c path/to/json

What are the system requirements for running Gigantics?

Gigantics can be installed on Linux and Windows systems. It requires a MongoDB database (v4.0 or higher) installed on a server. A server with 8GB of RAM is recommended. If running Gigantics and MongoDB on the same server, 16GB of RAM is recommended.

How do I install Gigantics on Linux/Mac?

For Linux/Mac installation, download the file gigantics-linux.tar.gz from the web and unzip it. From the terminal, run ./gig start.

How do I install Gigantics on Windows?

To install Gigantics on Windows, download the Windows file and unzip it. Open the console (cmd.exe) and run the file gig.exe. To start the server, run gig.exe start with the appropriate parameters.

What are the basic steps to configure Gigantics after installation?

After installation, go to localhost:5000 to start server configuration:

Set up your server parameters (host, port, base URL)
Configure the connection to MongoDB
Set up email server parameters for notifications

How do I configure MongoDB connection for Gigantics?

During setup, you'll need to configure the location of your Mongo server and access credentials. Make sure to enter the data correctly before saving the connection, and use the "Test" button to verify the connection works properly.

Can Gigantics work with MongoDB installed on a different instance?

Yes, Gigantics can work with MongoDB installed on a different instance. You can configure the database parameters to point to a remote MongoDB server.

What are the recommended directory configurations?

You can change the paths where logs, backups, or temporary files are stored in the directories configuration section. This allows you to customize the storage locations based on your system requirements.

What driver configurations are necessary?

For some supported databases like Oracle, it's necessary to install drivers manually. For Oracle, you need to have the Oracle instantclient package installed in the path /opt/instantclient_19_8/ or set the instant client path when starting the Gig instance.

Configuration and Setup

What is the default URL for accessing Gigantics?

After starting Gigantics, you can access the web interface at localhost:5000.

What parameters can I set during server configuration?

During server configuration, you can set:

Host name or IP address
Port number
Base URL
HTTP to HTTPS redirection settings (when using Nginx)
Certificate paths (when using Nginx)
Email server settings for notifications

How do I enable Nginx in Gigantics configuration?

In the advanced setup, you can enable Nginx by adding specifications such as HTTP to HTTP redirection or certificate paths during the server configuration step.

What email configuration parameters are available?

You can configure email server parameters including:

SMTP server address
SMTP port
Admin email address for notifications
SMTP authentication settings

How do I create different environments with configuration files?

Once the configuration is complete, a file is generated in the config/ folder where you can make changes. You can also generate new configuration files to create different environments.

What are the advanced database configuration options?

Advanced MongoDB configuration options include:

Authentication parameters
SSL certification settings
Connection pooling settings
Performance tuning parameters

How do I set up Oracle drivers for Gigantics?

To set up Oracle drivers:

Install the Oracle instantclient package
Place it in the path /opt/instantclient_19_8/
Alternatively, set the instant client path when starting Gigantics using LD_LIBRARY_PATH environment variable

How do I control process forking in Gigantics?

You can control worker process forking using the -w parameter:

-w -1 = run everything in the main process (no forking)
-w 0 = fork to all available CPUs
-w N (where N > 0) = fork to exactly N worker processes

This parameter controls whether Gigantics uses Node.js worker processes for better CPU utilization, not database clustering.

How do I check the installed version of Gigantics?

Run the command ./gig --version or gig.exe --version on Windows to check the currently installed version.

How do I get more information about command line options?

Run ./gig -h or gig.exe -h on Windows to see all available command line options and parameters.

Anonymization Features

What anonymization methods are available in Gigantics?

Gigantics offers several anonymization methods:

Fake data: Replace values with realistic fake data based on field labels
Masking: Replace parts of values with mask characters while preserving format
Shuffling: Randomly reorder values within the dataset while maintaining distribution
List: Replace values by picking randomly from a predefined list
Custom function: Write your own anonymization function using JavaScript code
Saved function: Use a previously created and saved custom function
Delete field: Completely remove the field from the output dataset
Blank field: Replace all values with null/empty values

How does the fake data anonymization work?

Fake data anonymization replaces your original data with realistic fake data based on the field labels. For example, a field labeled as "name" would be replaced with fake names, while a field labeled as "email" would be replaced with fake email addresses.

How does data masking work in Gigantics?

Data masking replaces parts of values with mask characters while preserving the format of the original data. For example, a credit card number "1234-5678-9012-3456" might become "--****-3456".

How does the shuffling anonymization method work?

Shuffling randomly reorders values within the dataset while maintaining the same value distribution. This is useful for preserving statistical properties while removing direct associations.

What is dictionary mode in anonymization?

Dictionary mode controls how replacement values are mapped during anonymization. You can maintain consistent mappings between original and replacement values across different executions.

What dictionary scope options are available?

Several dictionary scope options are available:

Inherit from rule: Use the default dictionary behavior defined at the rule level
Skip dictionary: Don't maintain consistent mapping between original and replacement values
Label scope: Maintain consistent mapping within fields that have the same label
Fieldname scope: Maintain consistent mapping within fields that have the same name
Entity/Field scope: Maintain consistent mapping within the same entity and field combination
Global scope: Maintain consistent mapping across all entities and fields
User-defined scope: Define your own scope for consistent mapping using a custom scope string

How do I ensure the same value is always replaced with the same anonymized value?

Use dictionary modes like "Label scope" or "Fieldname scope" to maintain consistent mappings. For example, if you want "John Smith" to always be replaced with the same fake name like "Jane Doe", select "Label scope" dictionary mode for name fields.

Can I write custom JavaScript functions for anonymization?

Yes, you can select a field and choose "Custom function" to write JavaScript code that takes the original value and returns an anonymized version.

How do I use predefined lists for anonymization?

You can use the "List" anonymization method to select random values from a predefined list. The lists are created from the project configuration items area.

How do I delete a field completely from my dataset?

Select the "Delete field" anonymization method to completely remove a field from the output dataset.

How do I replace all values with null/empty values?

Select the "Blank field" anonymization method to replace all values with null/empty values in a field.

How do I apply anonymization to only specific fields?

Gigantics allows you to configure anonymization at the field level, enabling you to specify different anonymization methods for each sensitive field while leaving others unchanged.

Can I maintain data utility while ensuring privacy compliance?

Yes, Gigantics is designed to maintain data utility while ensuring privacy compliance. Different anonymization techniques help preserve important data characteristics while protecting sensitive information.

Synthesis Features

What is the purpose of the Synthesize operation?

The Synthesize operation generates new synthetic data records based on your existing dataset or using custom functions. This is particularly useful for creating larger datasets for testing purposes while maintaining realistic data characteristics.

What synthesis methods are available in Gigantics?

Gigantics offers several synthesis methods:

Fake data + label: Generate realistic fake data based on the field's assigned label
Functions: Use built-in transformation functions
Saved functions: Apply a function previously created in the configuration items section
Custom function: Write your own JavaScript function to generate values
List: Select random values from a predefined list
Sequential numbers: Generate sequential numeric values
Random numbers: Generate random numeric values within specified ranges
No action: Keep the original values unchanged (default for newly added fields)

How do I control the size of synthesized datasets?

You can control the output size of synthesized datasets in two ways:

Same size as source entity: Generate exactly the same number of rows as exists in the source data
Proportional: Generate a different size using a percentage multiplier with optional minimum and maximum row limits

What behavior options are available when synthesizing data?

Two behavior options are available:

Append to source data: Keeps the existing rows and adds the synthesized rows to the end
Replace source data: Removes existing rows completely and inserts only the newly synthesized rows

How do I generate sequential numbers in synthesized data?

Use the "Sequential numbers" synthesis method to generate sequential numeric values for fields. This is useful for creating unique IDs or ordered sequences.

How do I generate random numbers within specific ranges?

Use the "Random numbers" synthesis method and specify the minimum and maximum values for the range. This allows you to generate random numeric values within your required constraints.

Can I maintain realistic data characteristics when synthesizing?

Yes, Gigantics is designed to help you create realistic test datasets while ensuring privacy compliance. Using fake data generation based on field labels helps maintain realistic data patterns.

How do I customize synthesis for specific fields?

You can customize synthesis at the field level by:

Selecting the entity in the synthesize configuration
Choosing specific synthesis methods for each field
Configuring parameters like ranges, lists, or locale preferences

How do I use custom JavaScript functions for data synthesis?

Select a field and choose "Custom function" to write JavaScript code that generates appropriate synthetic values. You can use built-in helpers like chance(), faker, or genLike() for realistic data generation.

How do I create synthetic postal codes that maintain format consistency?

You can use a custom function with the genLike() helper. For example:

// Generate realistic postal codes using the genLike function
return genLike('A1A 1A1') // Creates values like 'K9P 7N3' or 'M2J 4V8'

What built-in helpers are available for custom synthesis functions?

Built-in helpers available for custom synthesis functions include:

chance(): Generate random values with chance.js
faker: Generate realistic fake data with faker.js
genLike(): Generate data that follows a specific pattern

How do I use locale preferences for fake data generation?

You can specify locale preferences when using fake data generation to ensure the generated data matches your regional requirements (e.g., Spanish names, Mexican addresses, etc.).

How do I preserve relationships between synthesized data fields?

Use synchronized synthesis approaches or custom functions that generate related values together to preserve data relationships while creating synthetic datasets.

Operations

What is the Include/Exclude operation used for?

The Include/Exclude operation allows you to select which entities to include in or exclude from your new dataset. This operation is applied at the beginning of the pipeline to determine the initial set of entities to process.

How do Include mode and Exclude mode work?

In Include mode, only the selected entities are processed. In Exclude mode, all entities except the selected ones are processed. Include mode is helpful when you only need a few specific entities, while Exclude mode is useful when you want to omit just a few entities.

What is the Where operation?

The Where operation allows you to filter your dataset by creating queries that include only the records matching specific conditions. This operation is applied before any transformation operations in the pipeline.

How do I configure complex filtering rules with the Where operation?

You can organize Where operation rules into groups with AND/OR logic and nest groups to create complex filtering conditions. For example, you might have an OR group containing two AND groups to filter records matching either of two different sets of conditions.

What is the Limit operation for?

The Limit operation allows you to restrict the number of records in your dataset output. You can limit by absolute number of rows or by percentage, and apply the limit to all entities collectively or to individual entities separately.

How do I limit by absolute number of rows?

Set the Limit Type to "By number of rows" and enter the exact number of records you want to include in your output. For example, entering "1000" will limit your dataset to exactly 1000 records.

How do I limit by percentage of records?

Set the Limit Type to "By percentage" and enter the percentage value. For example, entering "20" will limit your dataset to 20% of all available records.

What row position options are available in the Limit operation?

Row position options include:

First records: Selects records from the beginning of the dataset
Last records: Selects records from the end of the dataset
Random records: Selects records randomly from the dataset

How do I set minimum and maximum row constraints when using percentage limits?

When using percentage-based limits, you can set additional constraints to ensure reasonable output sizes:

Min rows: Ensures you get at least this many rows even when the percentage of a small dataset might be too few
Max rows: Ensures you won't get more than this many rows even when the percentage of a large dataset might be too many

What is the "All entities" scope in the Limit operation?

The "All entities" scope applies the limit to the entire dataset, regardless of entity types. For example, if you have 1000 customer records and 1000 order records (2000 total), a limit of 500 would return 500 records total from any combination of entities.

What is the "By entity" scope in the Limit operation?

The "By entity" scope applies the limit separately to each entity type. For example, if you have customer and order entities, a limit of 500 would return up to 500 customer records AND up to 500 order records (1000 total records maximum).

Data Management

How do I download datasets from Gigantics?

Datasets can be downloaded in either JSON or CSV format. On the Datasets page, you have options to download, refresh, share, or delete datasets.

What formats are available for dataset download?

Datasets can be downloaded in the following formats:

JSON
CSV
SQL (when using share URLs with format parameters)

How do I access shared datasets?

Shared datasets are accessible via URLs with the following structure:

[protocol]//[hostname]/api/[organization]/[project]/model/[model-seq]/dataset/[dataset-seq]

Can I customize the format of shared datasets?

Yes, you can customize the shared URL with additional parameters to specify download formats:

format=json-zip: Download as JSON ZIP
format=csv-zip: Download as CSV ZIP
format=sql: Download as SQL

How do I load datasets into sinks?

To load a dataset into a sink:

The sink must be added to the model
Use the Load function to dump data into the sink
You can select to dump an existing dataset or load data on-the-fly from the tap by applying a rule

What is the difference between dumping and pumping data?

Dumping: Creates a dataset and then loads it into a sink
Pumping: Directly loads the tap into the sink without creating datasets

How do I use a tap as a pump destination?

Gigantics supports using the tap database as a pump destination, allowing you to directly process data from source to destination without intermediate storage.

How do I manage dataset size and storage?

You can control dataset size using the Limit operation and manage storage by periodically cleaning up old or unnecessary datasets.

Pipeline Features

What job types are supported in Pipelines?

Pipelines support various job types:

Scan: Scan the datasource looking for changes
Discover: Creates a new discovery
Create a dataset using rule: Create a dataset using an existing rule
Load using rule: Load the tap into a sink by using a rule (does not create datasets)
Dump dataset: Load a dataset into a sink
Pump the tap: Load the tap directly into the sink without creating datasets or applying rules

How do I trigger pipeline execution via API?

Pipelines can be executed by using a URL with the following format:

https://<server>:<port>/api/pipeline?api_key=<api_key>

What permissions are required to run pipeline rules?

The user who runs the rule must have permissions to edit models in the project.

How do I create periodic pipeline execution?

To schedule periodic execution, select a time interval (e.g., every day, every hour) during pipeline creation. Gigantics will automatically execute the pipeline at the specified intervals.

Can I revoke API keys for pipelines?

Yes, from the pipeline management window, you can create or revoke API keys used for remote pipeline execution.

Functions

What masking options are available for text data?

Text masking options include:

None: Keep the text unchanged
Uppercase: Convert to all uppercase (e.g., "foo bar" becomes "FOO BAR")
Lowercase: Convert to all lowercase (e.g., "FOO Bar" becomes "foo bar")
Title case: Capitalize first letter of each word (e.g., "foo bar" becomes "Foo Bar")
Snake case: Replace spaces with underscores (e.g., "foo bar" becomes "foo_bar")
Kebab case: Replace spaces with hyphens (e.g., "foo bar" becomes "foo-bar")

How does the character replacement function work?

Character replacement allows you to replace specific types of characters:

Alphabetical chars: Replace each letter with a different character
Digits: Replace each number with a different digit
Symbols: Replace each symbol with a different symbol

For example, replacing "test@email.com" with 'x' would result in "xxxx@xxxxx.xxx"

How do I replace full words with other values?

Use the "Word" replacement function to replace full words with new values. For example, replacing "John" with "Test" would result in "Test".

How do I use regex patterns for data replacement?

Use the "Regex" replacement function to replace data using regular expression patterns. For example:

Pattern: .+?(?=@) (matches everything before @ in an email)
Replace with: xxxx
Result: test@email.com becomes xxxx@email.com

How does the field replacement function work?

The "Field" replacement function replaces all data in a field with a new value. For example, replacing all field data in an address field with "Unnamed Road" would change all addresses to that value.

How does data shuffling work?

The "Shuffle" function collects column values and mixes them randomly. For example, with three cities (Orchard Park, Forney, Redondo Beach), shuffling might result in a different assignment of cities to the same records.

What is shuffle group functionality?

Shuffle group is a variant where selected fields are grouped together so they are mixed in the same way rather than independently. For example, if you shuffle city and state together, the same record will maintain coherent city-state pairs.

How do I use list-based data generation?

The "List" function selects random values from predefined lists created in the project configuration items area. If you have more records than list items, values will be repeated.

What is the purpose of the Delete function?

The "Delete" function NULLs the selected column. This can't be used on columns specified as NOT NULL in the database schema.

What is the Blank function used for?

The "Blank" function removes the value of the field entirely, making it empty rather than NULL.

Discovery Features

How does PII discovery work in Gigantics?

PII discovery scans your database schema to identify potential personally identifiable information based on field names, data patterns, and other characteristics. The system assigns labels to fields to help categorize sensitive data types.

How do I confirm discovered labels?

After discovery, you can review and confirm labels through the confirmation workflow. This allows you to validate which fields are correctly identified as PII and which require manual adjustment.

Can I group entities by risk or label in the discovery interface?

Yes, you can view entities grouped by risk level or by label type, making it easier to understand your data landscape and prioritize protection efforts.

How do I access label details in the discovery interface?

You can hover over labels in the discovery interface to see detailed information about what the label represents and why it was assigned.

What custom labels functionality is available?

You can create and manage custom labels for specific data identification requirements that aren't covered by the default label set.

Troubleshooting

What should I do when I see "CONSTRAINT_INDEX not a valid identifier" error?

This is a known Oracle-specific error. Make sure you're using the latest version of Gigantics which includes fixes for Oracle identifier issues.

How do I handle Oracle SQLLDR loading errors?

Check your Oracle driver configuration and ensure the instantclient is properly installed. Oracle SQLLDR errors often relate to environment setup rather than data processing.

What can I do about UTF-8 character display issues in dataset viewer?

This issue has been addressed in recent versions of Gigantics. Ensure you're running the latest release to get proper UTF-8 character support.

How do I solve "duplicate values for the index key" errors?

This typically occurs when synthesized or anonymized data creates duplicate values in unique fields. Use appropriate synthesis methods or add constraints to ensure unique value generation.

What should I do if DB2 connection times out?

Check your DB2 connection timeout settings. In newer versions, the hardcoded DB2i connection timeout has been addressed and made configurable.

How do I fix issues with fields longer than original size?

When using dictionary modes, ensure that generated values don't exceed the length constraints of your original fields by using appropriate masking or truncation functions.

What can cause "error closing SessionPool connection" in Oracle?

This is typically an Oracle driver issue. Ensure you're using compatible Oracle client libraries and check your connection pooling configuration.

How can I resolve "Cannot write dataset into SQL Server sink" errors?

Verify your SQL Server connection parameters and ensure proper authentication. Check that the sink driver type matches your source database driver type.

What should I do about virtual column insertion errors?

Gigantics should automatically ignore virtual columns to prevent insertion errors. If you're experiencing this issue, make sure you're using a recent version that properly handles virtual columns.

This is a known Oracle-specific error that has been addressed in recent versions. Make sure you're using the latest release of Gigantics for proper Oracle support.

Why might my dataset contain sensitive data after appending synthesized data?

When using "Append to source data" mode, remember that appended data will not be anonymized, so your dataset may still contain sensitive information from the original source.

How do I deal with "Invalid scale value" errors in DB2?

Check your DB2 field type definitions, especially for numeric fields with decimal precision. Recent Gigantics versions have improved DB2 type detection.

What causes "Invalid BSON type" errors with MongoDB?

These errors are typically related to data type mismatches during MongoDB discovery or processing. Ensure your MongoDB driver is properly configured for the specific MongoDB version you're using.

How do I solve issues with entity names including dots?

Recent versions of Gigantics have addressed processing issues with entity names that include special characters like dots. Make sure you're using an updated version.

What if I see "Error reading 'Interval day to second' type" in Oracle?

This is a known Oracle type handling issue that has been addressed in recent versions. Ensure you're using the latest Gigantics version for complete Oracle type support.

Performance and Optimization

How can I improve schema comparison performance with many tables?

Recent versions have optimized schema comparison performance to prevent the UI from becoming unresponsive with large numbers of tables.

What can I do to improve dictionary performance?

Clear dictionary operations have been optimized in recent versions. For better performance, consider using smaller dictionary scopes or periodically clearing unused dictionary entries.

How do I optimize dataset page performance?

New pagination systems have been implemented across dataset pages to improve loading times and UI responsiveness when working with large datasets.

What can I do to improve data loading performance?

Several enhancements have been made to debaser load performance. Consider using appropriate buffer sizes and batch processing for large data operations.

How can I optimize dictionary list performance?

Dictionary list performance has been enhanced with caching and pagination improvements. For very large dictionaries, consider filtering or clearing unused entries.

What are best practices for working with large datasets?

When working with large datasets:

Use Limit operations to reduce dataset size
Apply appropriate filtering with Where operations
Consider using periodic execution rather than manual processing
Monitor resource usage and adjust configuration as needed

Data Types and Formats

How does Gigantics handle date fields?

Gigantics properly handles date fields across different database systems, supporting filtering, anonymization, and synthesis operations on date data.

What string field operations are supported?

String fields support various operations including masking, transformation, replacement, and synthetic data generation based on labels.

How are numeric fields handled during anonymization?

Numeric fields can be masked, replaced with random or sequential numbers, shuffled, or transformed using custom functions while maintaining format.

How does Gigantics manage buffer field types?

Buffer field type detection has been enhanced in recent versions, particularly for DB2z systems where buffers are properly normalized without requiring record type buffer specifications.

User Interface

How do I navigate to dataset download URLs?

Recent versions have improved navigation from jobs to datasets and from datasets to download URLs with direct links and clearer UI indicators.

What UI enhancements help with entity management?

Entities can be grouped by risk or label, with labels shown in detail on mouse hover. Additionally, there are "move labelled to top" buttons to organize entities by their label status.

How has the custom function editor been improved?

The custom function editor size has been increased for better code readability and editing experience when creating anonymization or synthesis functions.

What UI improvements have been made for dataset management?

Dataset management UI has been revamped with:

Improved sorting capabilities
Direct links to API endpoints
Better call counter visibility
Enhanced format selection options

How do I access project variables in UI components?

Project variables can be used in various UI components and custom functions. Recent versions have fixed issues with project variable accessibility, ensuring they can be used anywhere in the application as intended.

How has the dataset viewer been improved?

The dataset viewer now correctly displays UTF-8 characters and has improved column tooltip functionality to show accurate values rather than incorrect information.

What UI enhancements have been made for job management?

Job management UI includes:

Last job result indicators
Better job rate calculations in project dashboards
Improved links from dataset color indicators in project dashboard
Better navigation from failed jobs to error details

How has the entity progress display been enhanced?

Entity progress is now displayed with improved UI elements including progress bars and percentage indicators that accurately reflect processing status.

Database Connection Features

How do I configure SSH tunneling for database connections?

When connecting to databases through SSH tunnels, ensure you're specifying the correct port. Recent fixes have addressed issues where ports were not properly passed to SSH tunnel connections.

How do I handle Oracle SYSDBA connections?

You can add the SYSDBA role to Oracle connections by configuring the appropriate connection parameters in your database connection settings.

What custom parameters are available for database connections?

Additional field "Custom parameters" is available in the database connection form, allowing you to specify specialized connection settings for your database drivers.

How do I filter Oracle tables by owner?

Oracle driver now includes filtering by owner functionality, allowing you to limit your data processing to specific database owners.

How do I handle MongoDB Atlas connections?

The MongoDB driver has been fixed to properly connect to MongoDB Atlas clusters, addressing previous connection issues.

How do I address DB2 long table name issues?

Recent versions have addressed DB2 long table name handling issues, ensuring proper processing of tables with extended names.

How do I handle special characters in entity names like dollar signs?

Processing issues with entity names that start with dollar signs have been fixed in recent versions.

How do I configure DB2 connection timeouts?

DB2i connection timeout is no longer hardcoded and can be configured to meet your specific requirements.

Data Generation Features

How do I generate Spanish or Mexican locale data?

You can use locale preferences like 'es' or 'es_MX' for generating data in Spanish locales, including proper Mexican names, addresses, and other localized data elements.

How do I normalize accent characters in generated data?

Generated data with accented characters is automatically normalized to ensure consistency across different systems and databases.

How do I generate realistic postal codes?

Use the genLike() function with a pattern like 'A1A 1A1' to generate realistic postal codes that follow standard formatting rules.

Use the randomSSN() function to generate realistic social security numbers for testing datasets.

How do I ensure consistent data generation across multiple runs?

Use dictionary modes to maintain consistent mapping between original and generated values across different executions of the same rule.

How do I handle data generation for virtual columns?

Gigantics automatically ignores virtual columns during data generation to prevent insertion errors.

How do I generate date values with proper formatting?

Use the randomDate() function to generate realistic date values that maintain proper formatting for your database systems.

How do I generate random numerical values within specific ranges?

Use the randomNumber() function with min and max parameters to generate random numbers within your required constraints.

Data Loading Features

How do I handle Oracle merge and update operations?

Recent versions have added Oracle merge and update functionality, supporting more complex data loading scenarios.

How do I address constraints disabling in MSSQL?

Fixes have been implemented for issues with disabling constraints in MSSQL databases during data loading operations.

How do I handle date issues with Oracle loads?

Several improvements have been made to address date handling issues in Oracle database loads, ensuring proper date value preservation.

How do I use the Oracle SQLLDR escape sequences?

Oracle SQLLDR escape sequences have been improved to handle special characters and formatting issues during data loading.

How do I handle Oracle alter constraints warnings?

Warnings related to Oracle alter constraints operations have been addressed to improve data loading reliability.

How do I clean up old dataset chunks during dump merge operations?

Dump merge operations now properly clean up old dataset chunks to prevent space issues and data inconsistencies.

How do I handle temporary file space issues with SQLite?

SQLite temporary file handling has been improved to prevent persistent space occupation on disk until the Gigantics process is terminated.

How do I properly set SQLite temporary directories?

SQLite database configuration now correctly uses specified temporary directories rather than incorrect default paths.

Advanced Features

How do I use the debaser command line interface?

The debaser CLI is integrated into the gig command, allowing you to run debaser operations directly from the command line with various parameters.

How do I use the fake data generation command?

Use the fake command with a label parameter to generate fake data for testing. For example: debaser fake name would generate fake names.

How do I run custom transformation code with the do command?

The do command runs transformation code for a set of records, allowing you to process data from the command line, tap, or dataset sources. You can specify code as a parameter to transform your data.

How do I discover database schemas from the command line?

Use the discover command to scan and identify PII elements in your database schemas directly from the command line interface.

How do I sample data from databases using debaser?

The sample command allows you to extract sample data from your databases for testing and analysis purposes.

How do I scan databases for changes?

The scan command enables you to scan your datasources looking for changes in schema or data, helping you identify when updates are needed.

How do I pump data between databases?

The pump command allows you to directly load data from a tap database to a sink database without creating intermediate datasets.

How do I dump datasets from the command line?

The dump command enables you to create datasets from your database and save them to files or load them into sinks.

How do I connect to databases from the command line?

The connect command helps you establish connections to your databases for testing connectivity and access.

How do I load data into sinks from the command line?

The load command allows you to load datasets into sink databases from the command line interface.

How do I initialize debaser environments?

The init command initializes debaser environments with appropriate configuration settings for your use cases.

How do I generate cactus tests?

Improved cactus tests can be generated to validate your data processing and synthesis operations.

How do I work with debaser modules without bootstrap overrides?

Fixes have been implemented to prevent bootstrap from overriding debaser modules, ensuring consistent functionality.

Schema Management

How do I compare schemas in Gigantics?

Schema comparison features allow you to analyze differences between database schemas. Recent optimizations have improved performance when comparing schemas with high numbers of tables.

How do I rename tables to entities in the interface?

Interface improvements have been made to properly rename tables to entities (or collections) for more intuitive data management.

How do I handle schema extraction counter issues?

Issues with extracting DDL counter not increasing have been addressed to ensure proper schema analysis tracking.

How do I use schema update operations?

Schema update operations allow you to apply changes to your database schemas as part of your data processing pipelines.

You can select specific schema versions when loading data to ensure compatibility with your target database structures.

Dictionary Features

How do I export and import dictionaries?

Dictionary export/import functionality allows you to save and restore dictionary mappings for consistent data processing across environments.

How do I create project dictionaries?

You can create project dictionaries to maintain consistent anonymization mappings within specific projects.

How do I view dictionary summaries?

Dictionary summary pages provide overview information about your dictionary mappings and usage statistics.

How do I clear dictionary entries efficiently?

Dictionary clearing operations have been optimized to avoid awful performance when clearing large dictionaries.

How do I check saved functions in rule imports?

Saved functions are now properly checked during rule import to ensure they exist within the project context.

Dataset Features

How do I create new datasets from existing ones?

You can copy datasets to create new datasets with similar or modified characteristics.

How do I handle datasets with many entities?

Audit reports with many entities now properly load without failures, addressing previous performance issues.

How do I sort datasets by size correctly?

Dataset size sorting has been fixed to ensure proper ordering by actual data size rather than alphabetical sorting.

How do I drill down to specific datasets from project summaries?

Improved navigation allows you to drill down to specific datasets from project summary panels for better data management.

How do I handle duplicate log entries in job processing?

Issues with duplicate logs being saved in job streams have been addressed to ensure cleaner job processing records.

How do I handle temporary file locations?

Temporary files are now created in the correct directories rather than incorrect locations that could cause processing issues.

How do I remove tables from datasets?

You can remove specific tables from datasets to create customized data collections for your needs.

How do I limit results in dataset download links?

Dataset download links now support limit parameters to control the amount of data downloaded.

Rule Management

How do I export and import rules between projects?

Rule export/import functionality allows you to share rule configurations between different projects while preserving field dictionary options.

How do I manage rule entity and field ordering?

Rules now send entity and fields ordered as arrays, ensuring consistent processing order in your data operations.

How do I avoid duplicate values in rule outputs?

Recent features prevent duplicate values from appearing in rule outputs, ensuring cleaner dataset generation.

How do I handle function references in rule imports?

Saved function references are maintained when importing rules, preserving custom function connections.

Labels Features

How do I disable or enable all labels at once?

You can disable or enable all labels simultaneously with the new toggle functionality.

How do I handle label filters with high numbers of entities?

Improved label filter usability helps you manage filtering even when working with high numbers of entities.

How do I track label change reasons?

Label change reason tracking is now available, with the field being mandatory only when entities are confirmed.

How do I discover labels using column names?

Label discovery now uses column names as part of the identification process to improve accuracy.

How do I handle Spanish names and surnames generation?

Fixes have been implemented to ensure Spanish last names generators properly return last names rather than first names.

Job Management

How do I restart failed jobs?

Job restart functionality allows you to restart failed jobs to continue processing without starting from scratch.

How do I handle "overwrite job already running" errors?

Fixes for the "overwrite job already running" error provide more accurate job status detection.

How do I address incorrect succeeded job rates?

Fixes for incorrect succeeded job rates in project dashboards provide more accurate job performance metrics.

How do I handle PostgreSQL schema row count issues?

PostgreSQL schemas row count issues have been fixed to provide more accurate data statistics.

How do I rerun jobs with new job IDs?

When rescheduling jobs, new job IDs are properly generated to ensure distinct job tracking.

How do I handle job stream duplicate entries?

Issues with duplicate entries in job streams have been addressed to ensure cleaner job logs.

Memory and Performance Management

How do I handle MongoMapDriver memory usage?

Fixes have been implemented to address MongoMapDriver memory usage issues for improved performance.

How do I manage memory usage with large datasets?

Use appropriate pagination, filtering, and limit techniques to manage memory usage when working with large datasets or databases.

User Management and Permissions

How do I set up user permissions in Gigantics?

Gigantics allows you to create fine-grained permissions for users in projects, controlling who can edit models, run jobs, and access sensitive data.

What is the difference between admin and regular users?

Admin users have access to global settings and can modify server-wide configurations, while regular users can only modify project-specific settings they have permissions for.

How do I invite users to my projects?

You can invite users from your organization to join projects through the project configuration features, allowing for collaborative data management.

What are the different permission levels available?

Permission levels typically include:

Viewer: Can only view data
Editor: Can modify datasets and rules
Admin: Can edit models and manage project settings

Technical Troubleshooting

How do I handle case type application issues?

Case type applications have been fixed to ensure proper transformation of data according to your specified requirements.

How do I manage virtual column issues in Oracle?

Error inserting data on virtual columns in Oracle has been resolved by automatically ignoring virtual columns during processing.

How do I address invalid data truncation in DB2?

Data truncation errors in DB2 have been addressed to improve data processing reliability.

How do I handle Oracle SessionPool connection closing issues?

Issues with Oracle SessionPool connections not closing properly have been fixed to improve Oracle connection management.

How do I address include/exclude general checkbox issues?

Include/exclude general checkbox behavior now correctly applies only to filtered entities rather than all entities.

How do I handle MSSQL query builder errors?

MSSQL query builder errors have been fixed to improve SQL Server compatibility.

How do I address field length truncation issues?

Truncation issues on 'max' fields length have been resolved to ensure proper handling of large text fields.

How do I handle "back to entities" button resetting filters?

The "back to entities" button in label editors now preserves filters rather than resetting them.

How do I address audit report loading issues?

Audit reports with many entities now properly load without failures thanks to recent performance improvements.

Installation and Maintenance

How do I upgrade Gigantics to a newer version?

To upgrade Gigantics, download the latest version package and follow the installation steps. Make sure to backup your configuration files before upgrading.

How do I backup my Gigantics configuration?

Configuration files are stored in the config/ directory. Make sure to backup these files before performing upgrades or major changes.

How do I check for system compatibility issues?

Ensure your system meets the minimum requirements for RAM and database versions. Check the installation documentation for detailed requirements.

How do I troubleshoot installation failures?

If installation fails, check:

System requirements (RAM, disk space, etc.)
Database compatibility versions
Required drivers for specific database systems
File permissions on installation directories

Custom Development

How do I create custom functions in Gigantics?

You can create custom functions in JavaScript that take original values and return processed values. These can be used for specialized anonymization or synthesis requirements.

How do I debug custom functions?

Use the debaser command line tools to test custom functions with sample data before applying them to production datasets.

Save custom functions in the project configuration items area to reuse them in multiple rules within the same project.

Data Security

How does Gigantics ensure data privacy during processing?

Gigantics processes data with privacy by design principles:

Anonymization operations replace sensitive values
Synthetic data generation creates new datasets without original sensitive information
Dictionary mappings can be cleared or exported as needed for compliance
Secure connection protocols for database access

To implement GDPR compliance: