Labels
Labels in Gigantics are classifications assigned to database fields during the discovery process. These labels determine how data will be handled during anonymization and synthesis operations.
System Labels
Gigantics comes with a comprehensive set of predefined system labels for automatically detecting various types of sensitive data. These labels are organized into categories based on the type of information they identify:
Business Information
| Label | Description | PII Status | Risk Level | Detection Method |
|---|---|---|---|---|
business/company | Company names and organizations | No | Low | Column name patterns, contextual data analysis |
business/department | Department names within organizations | No | Low | Column name patterns, contextual data analysis |
business/job_title | Professional job titles | No | Low | Column name patterns, predefined lists of job titles |
Datetime Information
| Label | Description | PII Status | Risk Level | Detection Method |
|---|---|---|---|---|
datetime/date/format1 to datetime/date/format12 | Various date formats (MM/DD/YYYY, DD/MM/YYYY, etc.) | Conditionally | Low to Medium | Pattern matching against multiple date format regexes |
datetime/time_zone | Time zone identifiers | No | Low | Column name patterns, predefined time zone lists |
datetime/time | Time values | Conditionally | Low | Pattern matching, column name analysis |
Financial Information
| Label | Description | PII Status | Risk Level | Detection Method |
|---|---|---|---|---|
finance/bitcoin | Bitcoin addresses | Conditionally | High | Pattern matching using Bitcoin address format regex |
finance/creditcard_type | Credit card type identifiers | No | Medium | Column name patterns, predefined credit card type lists |
finance/creditcard | Credit card numbers | Yes | Very High | Pattern matching using Luhn algorithm validation |
finance/currency_code | Currency codes (USD, EUR, etc.) | No | Low | Column name patterns, predefined currency code lists |
finance/currency | Currency names and symbols | No | Low | Column name patterns, predefined currency lists |
finance/ethereum | Ethereum addresses | Conditionally | High | Pattern matching using Ethereum address format regex |
finance/iban | International Bank Account Numbers | Yes | Very High | Pattern matching using IBAN format validation |
finance/money | Monetary values | No | Medium | Pattern matching, column name analysis |
Health Information
| Label | Description | PII Status | Risk Level | Detection Method |
|---|---|---|---|---|
health/drug | Drug names and medications | No | Medium | Column name patterns, predefined drug name databases |
Identifiers
| Label | Description | PII Status | Risk Level | Detection Method |
|---|---|---|---|---|
identifier/dea | DEA (Drug Enforcement Administration) numbers | Yes | High | Pattern matching using DEA format validation |
identifier/dni | Document National Identity numbers | Yes | Very High | Pattern matching using DNI format validation |
identifier/isbn | International Standard Book Numbers | No | Low | Pattern matching using ISBN format validation |
identifier/nhs | National Health Service numbers | Yes | High | Pattern matching using NHS number format validation |
identifier/nino | National Insurance Numbers | Yes | High | Pattern matching using NINO format validation |
identifier/ssn | Social Security Numbers | Yes | Very High | Pattern matching using SSN format validation |
Location Information
| Label | Description | PII Status | Risk Level | Detection Method |
|---|---|---|---|---|
location/address | Physical street addresses | Yes | High | Pattern matching, column name analysis |
location/city | City names | Conditionally | Medium | Column name patterns, predefined city name databases |
location/city/de | German city names | Conditionally | Medium | Language-specific city databases |
location/city/es | Spanish city names | Conditionally | Medium | Language-specific city databases |
location/country_code | Country codes (US, UK, DE, etc.) | No | Low | Column name patterns, predefined country code lists |
location/country/ar | Arabic country names | No | Low | Language-specific country databases |
location/country/en | English country names | No | Low | Language-specific country databases |
location/country/es | Spanish country names | No | Low | Language-specific country databases |
location/latitude | Geographic latitude coordinates | Conditionally | Low | Pattern matching, column name analysis |
location/longitude | Geographic longitude coordinates | Conditionally | Low | Pattern matching, column name analysis |
location/phone | Phone numbers (general) | Yes | High | Pattern matching, column name analysis |
location/phone/format1 to location/phone/format4 | Different phone number formats | Yes | High | Format-specific pattern matching |
location/state/US/abbr | US state abbreviations | Conditionally | Low | Pattern matching, predefined state lists |
location/state/US/full | Full US state names | Conditionally | Low | Pattern matching, predefined state lists |
location/zip_code | ZIP/postal codes | Conditionally | Medium | Pattern matching, column name analysis |
Personal Information
| Label | Description | PII Status | Risk Level | Detection Method |
|---|---|---|---|---|
person/gender | Gender identifiers | Yes | Low | Column name patterns, predefined gender lists |
person/name/en/first | English first names | Yes | High | Pattern matching against English name databases |
person/name/en/full | Full English names | Yes | High | Multi-word pattern matching |
person/name/en/last | English last names | Yes | High | Pattern matching against English surname databases |
person/name/es | Spanish names | Yes | High | Language-specific name databases |
person/name/fr | French names | Yes | High | Language-specific name databases |
person/race | Race/Ethnicity identifiers | Yes | High | Column name patterns, predefined race lists |
Technical Information
| Label | Description | PII Status | Risk Level | Detection Method |
|---|---|---|---|---|
tech/email | Email addresses | Yes | Medium | Pattern matching using email regex validation |
tech/guid | Globally Unique Identifiers | Conditionally | Low | Pattern matching using GUID format regex |
tech/hex_color | Hexadecimal color codes | No | Low | Pattern matching using hex color format regex |
tech/ipv4 | IPv4 addresses | Conditionally | Medium | Pattern matching using IPv4 format validation |
tech/ipv6 | IPv6 addresses | Conditionally | Medium | Pattern matching using IPv6 format validation |
tech/locale | Locale/Regional settings | No | Low | Pattern matching, predefined locale lists |
tech/mac | MAC addresses | Conditionally | Low | Pattern matching using MAC address format regex |
tech/md5 | MD5 hash values | Conditionally | Low | Pattern matching using MD5 format regex |
tech/mime_type | MIME type identifiers | No | Low | Pattern matching, predefined MIME type lists |
tech/sha1 | SHA1 hash values | Conditionally | Low | Pattern matching using SHA1 format regex |
tech/sha256 | SHA256 hash values | Conditionally | Low | Pattern matching using SHA256 format regex |
tech/url | Web URLs | Conditionally | Low | Pattern matching using URL regex validation |
tech/user_agent | Browser user agent strings | Conditionally | Low | Pattern matching, predefined user agent patterns |
Miscellaneous
| Label | Description | PII Status | Risk Level | Detection Method |
|---|---|---|---|---|
misc/ar | Arabic words and phrases | No | Low | Language-specific pattern matching |
misc/common | Common words | No | Low | Pattern matching against common word lists |
misc/en | English words | No | Low | Pattern matching against English word lists |
misc/es | Spanish words | No | Low | Pattern matching against Spanish word lists |
misc/fr | French words | No | Low | Pattern matching against French word lists |
misc/numbers | Numeric patterns | No | Low | Pattern matching, data type analysis |
Label Properties
Each label has two key properties:
PII Field
Indicates whether the field contains Personally Identifiable Information:
- True: Field contains sensitive personal data
- False: Field does not contain sensitive personal data
Severity
Represents the risk level if the data were exposed:
- Low: Minimal risk (e.g., Gender)
- Medium: Moderate risk (e.g., Email addresses)
- High: Significant risk (e.g., Names, Addresses)
- Very High: Critical risk (e.g., SSN, Credit Cards)
Label Assignment Process
During discovery, Gigantics automatically assigns labels using a multi-layered approach:
- Column names: Matching against known patterns (e.g., "email", "phone")
- Data patterns: Analyzing sample values for format matches using regex and validation algorithms
- Dictionary lookup: Comparing against predefined sensitive data dictionaries with thousands of entries
- Machine learning: Using trained neural network models to recognize complex patterns
- Contextual analysis: Examining data in context with related fields for more accurate classification
Confidence levels are displayed as percentages indicating how certain the system is about the label assignment.
Managing Labels
After discovery, you can:
- Edit field labels to correct misclassifications
- Create custom labels for organization-specific needs
- Adjust sensitivity levels for your risk tolerance
- Confirm classifications to lock in final labels
Labels are essential for ensuring accurate anonymization and data synthesis in subsequent steps.
Label Assignment Process
During discovery, Gigantics automatically assigns labels based on:
- Column names: Matching against known patterns (e.g., "email", "phone")
- Data patterns: Analyzing sample values for format matches
- Dictionary lookup: Comparing against predefined sensitive data dictionaries
- Machine learning: Using trained models to recognize complex patterns
Confidence levels are displayed as percentages indicating how certain the system is about the label assignment.
Managing Labels
After discovery, you can:
- Edit field labels to correct misclassifications
- Create custom labels for organization-specific needs
- Adjust sensitivity levels for your risk tolerance
- Confirm classifications to lock in final labels
Labels are essential for ensuring accurate anonymization and data synthesis in subsequent steps.