Labels

Labels in Gigantics are classifications assigned to database fields during the discovery process. These labels determine how data will be handled during anonymization and synthesis operations.

System Labels

Gigantics comes with a comprehensive set of predefined system labels for automatically detecting various types of sensitive data. These labels are organized into categories based on the type of information they identify:

Business Information

Label	Description	PII Status	Risk Level	Detection Method
`business/company`	Company names and organizations	No	Low	Column name patterns, contextual data analysis
`business/department`	Department names within organizations	No	Low	Column name patterns, contextual data analysis
`business/job_title`	Professional job titles	No	Low	Column name patterns, predefined lists of job titles

Datetime Information

Label	Description	PII Status	Risk Level	Detection Method
`datetime/date/format1` to `datetime/date/format12`	Various date formats (MM/DD/YYYY, DD/MM/YYYY, etc.)	Conditionally	Low to Medium	Pattern matching against multiple date format regexes
`datetime/time_zone`	Time zone identifiers	No	Low	Column name patterns, predefined time zone lists
`datetime/time`	Time values	Conditionally	Low	Pattern matching, column name analysis

Financial Information

Label	Description	PII Status	Risk Level	Detection Method
`finance/bitcoin`	Bitcoin addresses	Conditionally	High	Pattern matching using Bitcoin address format regex
`finance/creditcard_type`	Credit card type identifiers	No	Medium	Column name patterns, predefined credit card type lists
`finance/creditcard`	Credit card numbers	Yes	Very High	Pattern matching using Luhn algorithm validation
`finance/currency_code`	Currency codes (USD, EUR, etc.)	No	Low	Column name patterns, predefined currency code lists
`finance/currency`	Currency names and symbols	No	Low	Column name patterns, predefined currency lists
`finance/ethereum`	Ethereum addresses	Conditionally	High	Pattern matching using Ethereum address format regex
`finance/iban`	International Bank Account Numbers	Yes	Very High	Pattern matching using IBAN format validation
`finance/money`	Monetary values	No	Medium	Pattern matching, column name analysis

Health Information

Label	Description	PII Status	Risk Level	Detection Method
`health/drug`	Drug names and medications	No	Medium	Column name patterns, predefined drug name databases

Identifiers

Label	Description	PII Status	Risk Level	Detection Method
`identifier/dea`	DEA (Drug Enforcement Administration) numbers	Yes	High	Pattern matching using DEA format validation
`identifier/dni`	Document National Identity numbers	Yes	Very High	Pattern matching using DNI format validation
`identifier/isbn`	International Standard Book Numbers	No	Low	Pattern matching using ISBN format validation
`identifier/nhs`	National Health Service numbers	Yes	High	Pattern matching using NHS number format validation
`identifier/nino`	National Insurance Numbers	Yes	High	Pattern matching using NINO format validation
`identifier/ssn`	Social Security Numbers	Yes	Very High	Pattern matching using SSN format validation

Location Information

Label	Description	PII Status	Risk Level	Detection Method
`location/address`	Physical street addresses	Yes	High	Pattern matching, column name analysis
`location/city`	City names	Conditionally	Medium	Column name patterns, predefined city name databases
`location/city/de`	German city names	Conditionally	Medium	Language-specific city databases
`location/city/es`	Spanish city names	Conditionally	Medium	Language-specific city databases
`location/country_code`	Country codes (US, UK, DE, etc.)	No	Low	Column name patterns, predefined country code lists
`location/country/ar`	Arabic country names	No	Low	Language-specific country databases
`location/country/en`	English country names	No	Low	Language-specific country databases
`location/country/es`	Spanish country names	No	Low	Language-specific country databases
`location/latitude`	Geographic latitude coordinates	Conditionally	Low	Pattern matching, column name analysis
`location/longitude`	Geographic longitude coordinates	Conditionally	Low	Pattern matching, column name analysis
`location/phone`	Phone numbers (general)	Yes	High	Pattern matching, column name analysis
`location/phone/format1` to `location/phone/format4`	Different phone number formats	Yes	High	Format-specific pattern matching
`location/state/US/abbr`	US state abbreviations	Conditionally	Low	Pattern matching, predefined state lists
`location/state/US/full`	Full US state names	Conditionally	Low	Pattern matching, predefined state lists
`location/zip_code`	ZIP/postal codes	Conditionally	Medium	Pattern matching, column name analysis

Personal Information

Label	Description	PII Status	Risk Level	Detection Method
`person/gender`	Gender identifiers	Yes	Low	Column name patterns, predefined gender lists
`person/name/en/first`	English first names	Yes	High	Pattern matching against English name databases
`person/name/en/full`	Full English names	Yes	High	Multi-word pattern matching
`person/name/en/last`	English last names	Yes	High	Pattern matching against English surname databases
`person/name/es`	Spanish names	Yes	High	Language-specific name databases
`person/name/fr`	French names	Yes	High	Language-specific name databases
`person/race`	Race/Ethnicity identifiers	Yes	High	Column name patterns, predefined race lists

Technical Information

Label	Description	PII Status	Risk Level	Detection Method
`tech/email`	Email addresses	Yes	Medium	Pattern matching using email regex validation
`tech/guid`	Globally Unique Identifiers	Conditionally	Low	Pattern matching using GUID format regex
`tech/hex_color`	Hexadecimal color codes	No	Low	Pattern matching using hex color format regex
`tech/ipv4`	IPv4 addresses	Conditionally	Medium	Pattern matching using IPv4 format validation
`tech/ipv6`	IPv6 addresses	Conditionally	Medium	Pattern matching using IPv6 format validation
`tech/locale`	Locale/Regional settings	No	Low	Pattern matching, predefined locale lists
`tech/mac`	MAC addresses	Conditionally	Low	Pattern matching using MAC address format regex
`tech/md5`	MD5 hash values	Conditionally	Low	Pattern matching using MD5 format regex
`tech/mime_type`	MIME type identifiers	No	Low	Pattern matching, predefined MIME type lists
`tech/sha1`	SHA1 hash values	Conditionally	Low	Pattern matching using SHA1 format regex
`tech/sha256`	SHA256 hash values	Conditionally	Low	Pattern matching using SHA256 format regex
`tech/url`	Web URLs	Conditionally	Low	Pattern matching using URL regex validation
`tech/user_agent`	Browser user agent strings	Conditionally	Low	Pattern matching, predefined user agent patterns

Miscellaneous

Label	Description	PII Status	Risk Level	Detection Method
`misc/ar`	Arabic words and phrases	No	Low	Language-specific pattern matching
`misc/common`	Common words	No	Low	Pattern matching against common word lists
`misc/en`	English words	No	Low	Pattern matching against English word lists
`misc/es`	Spanish words	No	Low	Pattern matching against Spanish word lists
`misc/fr`	French words	No	Low	Pattern matching against French word lists
`misc/numbers`	Numeric patterns	No	Low	Pattern matching, data type analysis

Label Properties

Each label has two key properties:

PII Field

Indicates whether the field contains Personally Identifiable Information:

True: Field contains sensitive personal data
False: Field does not contain sensitive personal data

Severity

Represents the risk level if the data were exposed:

Low: Minimal risk (e.g., Gender)
Medium: Moderate risk (e.g., Email addresses)
High: Significant risk (e.g., Names, Addresses)
Very High: Critical risk (e.g., SSN, Credit Cards)

Label Assignment Process

During discovery, Gigantics automatically assigns labels using a multi-layered approach:

Column names: Matching against known patterns (e.g., "email", "phone")
Data patterns: Analyzing sample values for format matches using regex and validation algorithms
Dictionary lookup: Comparing against predefined sensitive data dictionaries with thousands of entries
Machine learning: Using trained neural network models to recognize complex patterns
Contextual analysis: Examining data in context with related fields for more accurate classification

Confidence levels are displayed as percentages indicating how certain the system is about the label assignment.

Managing Labels

After discovery, you can:

Edit field labels to correct misclassifications
Create custom labels for organization-specific needs
Adjust sensitivity levels for your risk tolerance
Confirm classifications to lock in final labels

Labels are essential for ensuring accurate anonymization and data synthesis in subsequent steps.

Label Assignment Process

During discovery, Gigantics automatically assigns labels based on:

Column names: Matching against known patterns (e.g., "email", "phone")
Data patterns: Analyzing sample values for format matches
Dictionary lookup: Comparing against predefined sensitive data dictionaries
Machine learning: Using trained models to recognize complex patterns

Confidence levels are displayed as percentages indicating how certain the system is about the label assignment.

Managing Labels

After discovery, you can:

Edit field labels to correct misclassifications
Create custom labels for organization-specific needs
Adjust sensitivity levels for your risk tolerance
Confirm classifications to lock in final labels

Labels are essential for ensuring accurate anonymization and data synthesis in subsequent steps.

On this page