Introduction

What is Gigantics?

Welcome to Gigantics! Gigantics is a powerful data security platform that runs exclusively as a local installation. This means you have complete control over your data and infrastructure, with no cloud dependencies or external data transfers.

Gigantics connects directly to your data sources (which we call "taps"), analyzes their schemas, and then helps you create anonymized or synthesized versions of your data. You can then output this processed data to various destinations (which we call "sinks") - either the same database as your source for updates, or different databases for testing, QA, or development environments.

Gigantics Hierarchy

Before diving into taps and sinks, it's important to understand the hierarchical structure of Gigantics. Everything in Gigantics is organized within this structure, which you'll encounter as soon as you start using the platform.

Learn more about each component:

  • Organization - Your workspace to organize projects
  • Project - Your data processing environment
  • Tap - Connections to your data sources
  • Model - Container for your data processing rules
  • Rules - Transform rules for data anonymization
  • Jobs - Execution instances of your workflows
  • Datasets - Data collections and results
  • Sinks - Data destinations for processed outputs

Organizations and Projects

Each user has their own space that we call Organizations which contains Projects. These projects can be shared with other users through the project configuration features.

In addition, each user can create more organizations that may contain one or more projects.

The project is the user's workspace. From here, users can create models, work on the databases and invite users from your organization to join the project.

Taps / Sinks

In Gigantics, everything revolves around taps and sinks:

  • Taps are connections to your data sources. These can be databases like Oracle, DB2, SQL Server, PostgreSQL, MongoDB, and many others. Learn more about taps and how to configure them.

  • Sinks are connections to your data destinations. These are where your processed datasets will be output. You can use the same technology as your taps or different ones entirely. Learn more about sinks and their configuration.

This architecture allows you to:

✓ Extract data from production databases ✓ Process sensitive data safely within your local environment ✓ Load processed datasets into test, QA, or development databases ✓ Maintain complete control over your data pipeline

AI-Powered Data Labeling

One of Gigantics' most powerful features is its ability to automatically label your data using advanced AI libraries. This intelligent labeling system helps identify and categorize sensitive information without requiring manual configuration.

How AI Labeling Works

Gigantics uses sophisticated machine learning models to analyze your data patterns and automatically assign labels to different types of information. These labels help you understand what kind of data you're working with and how it should be protected.

🔒 Privacy First - No Data Exfiltration

Important: Gigantics processes all your data locally within your environment. No data is ever sent to external servers or cloud services during the AI labeling process. All AI libraries run entirely on your local machine, ensuring complete data privacy and compliance with your security requirements.

Built-in Label Categories

The AI system automatically identifies and labels common data types including:

  • Personal identifiers (names, emails, phone numbers)
  • Financial information (credit cards, bank accounts)
  • Location data (addresses, coordinates)
  • Healthcare information (medical records, patient data)
  • Business-sensitive data (trade secrets, proprietary information)

Learn more about available labels and how they're applied to your data.

Data Labeling Example

To illustrate how Gigantics labels your data, here's an example of raw data and how it gets classified:

This diagram shows how raw data values are automatically mapped to specific Gigantics labels during the discovery process. Each label follows a hierarchical naming convention that indicates both the data type and its sensitivity level. The diagram illustrates:

  • Raw data examples on the left (John Smith, 123-45-6789, etc.)
  • Gigantics labels in the middle (person/name/en/full, identifier/ssn, etc.)
  • Label categories on the right (Personal Identifiers, Financial Information, etc.)

Custom Label Creation

While the AI system provides comprehensive automatic labeling, you also have full control to extend and customize the labeling process:

1. Create Custom Labels with Regex

You can define your own label categories using regular expressions to match specific data patterns in your organization:

Custom Label with RegexEMP-123456EMP-789012/EMP-\d6/CustomLabelMatch pattern → Apply regex → Create custom label

This allows you to identify organization-specific data formats that the AI models might not recognize out of the box.

2. Create Labels from Sample Data

Instead of writing complex regex patterns, you can create labels by providing sample data examples. Gigantics learns from your examples and identifies similar patterns throughout your dataset:

// Example: Label from samples
John.Doe@company.com
jane.smith@company.org
// Gigantics learns this is an "internal_email" pattern

3. Custom Anonymization Functions

For complete control over data protection, you can create custom anonymization functions using JavaScript snippets. These "functions" allow you to define exactly how sensitive data should be transformed:

// Example: Custom anonymization function
function anonymizePhoneNumber(phone) {
  // Keep area code, mask the rest
  return phone.replace(/(\d{3})\d{3}(\d{4})/, '$1-XXX-$2');
}

Learn more about creating custom labels and anonymization functions.

Work in a Model

After setting up your organization and project, you can start working with data models in Gigantics. The model workflow is where the magic happens in Gigantics. Here's how it works step by step:

1. Define a Tap

First, you'll create a connection to your data source. This is your tap, which tells Gigantics how to access your database. Learn more about taps

2. Scan the Schema

Once your tap is configured, Gigantics will scan your database schema to understand its structure. This includes tables, views, columns, and their data types. Schema Management Documentation

3. Run Discovery

After scanning, you can run our PII (Personally Identifiable Information) discovery process. This automatically identifies potentially sensitive fields in your database using advanced algorithms. PII Discovery Documentation

4. Create Datasets with Rules

With your schema understood and sensitive fields identified, you can create datasets by writing rules that determine:

  • Which data to include or exclude
  • How to anonymize sensitive fields
  • How to synthesize new data elements
  • What transformations to apply

Dataset Creation Documentation

Learn How to Configure Jobs

Jobs are the executed instances of your data processing workflows. Understanding how to configure and manage jobs is key to getting the most out of Gigantics:

Job Configuration Steps

  1. Create a Pipeline: Define your data processing workflow that connects taps to sinks. Pipeline Configuration

  2. Schedule or Trigger Jobs: Run your pipelines manually or set up automated schedules. Job Management

  3. Monitor Execution: Track the progress and results of your jobs, including success or failure status.

  4. Review Outputs: Verify that your processed datasets have been correctly loaded to your sinks.

Getting Started Documentation Roadmap

To master each aspect of Gigantics, we recommend reading through these documents in order:

  1. Organizations Overview - Learn how to set up your workspace
  2. Projects Overview - Understand your data processing environment
  3. Taps Overview - Learn how to connect to your data sources
  4. Schema Management - Understand how Gigantics reads your database structures
  5. PII Discovery - See how sensitive data is automatically identified
  6. Datasets - Learn how to create and manage datasets
  7. Pipelines - Configure your data processing workflows
  8. Jobs - Execute and monitor your data processing tasks

Gigantics gives you complete control over your data lifecycle while ensuring sensitive information is properly protected. Whether you need to anonymize production data for testing or synthesize entirely new datasets, Gigantics provides the tools to do it securely within your local environment. You'll start by creating an organization and project, then define your taps to extract data, work with models to process it, and finally configure sinks to load your anonymized or synthesized data where you need it.