Jobs
The Jobs page is your operational hub for managing all data processing activities in your model. Here you can see everything that has run, is running, or is scheduled to run in the future. It's like a dashboard for all your data operations.
Page overview
┌───────────────────────────────────────────────────────────────┐
│ Jobs Page │
│ ┌─────────────┬────────────┐ ┌────────────────────────────┐ │
│ │ Last Jobs ▣ │ Scheduled │ │ New Job ⊕ • Sort ▾ │ │
│ └─────────────┴────────────┘ └────────────────────────────┘ │
│ ┌───────────────────────────────────────────────────────────┐ │
│ │ Status | Name | Type | Rule | Created By | Info | Actions │ │
│ │ ● | tap→sink dump │ │
│ │ ○ | nightly discover │ │
│ └───────────────────────────────────────────────────────────┘ │
└───────────────────────────────────────────────────────────────┘| UI area | What it shows | How to interact |
|---|---|---|
Tabs (Last Jobs, Scheduled Jobs) | Filter your view between recent activity and future scheduled work. | Click tabs to switch between views. |
Toolbar (New Job, Sort) | Primary actions to create new jobs or sort existing ones. | Use the New Job button to create a new data processing job. |
| Jobs table | Table showing all your data processing activities with their status and details. | Click on any column header to filter results. Click on a job name to see details. |
Jobs table columns
| Column | Sample value | What it shows |
|---|---|---|
Status | Running indicator or Completed checkmark | Current state of the job. |
Name | tap-to-s3 (2024-05-06 01:00) | Descriptive name of the job. Click to see detailed logs. |
Model (project-wide view only) | Customer Data | Which data model this job belongs to. |
Type | dump, load, pump, discover, scan | What kind of operation the job performs. |
Rule | Anonymize PII | If applicable, which rule was applied to your data. |
Created By | jane.doe | Who initiated the job. |
Info | Started: 12:21 • Duration: 00:03:18 or Next: 07/24/2024 22:00 | When the job started/finished or when it's scheduled to run. |
Actions | ⋮ menu with available actions | Context-sensitive actions based on the job's current status. |
Status indicators
| Status | What it means | What you can do |
|---|---|---|
queued | Job accepted and waiting to start. | Cancel if needed, otherwise wait for it to begin. |
running | Job is currently processing data. | Monitor progress through the job details page; cancel if needed. |
completed | Job finished successfully. | View results or rerun if needed. |
failed | Job encountered an error. | Restart to retry the failed parts or rerun completely. |
scheduled | Job is set to run at a future time. | Edit or cancel the scheduled time. |
Working with jobs
Starting new jobs
The New Job button (+) in the toolbar opens the New Job modal, which is the primary interface for creating new data processing jobs. The modal provides a comprehensive configuration interface with the following capabilities:
Key features of the New Job modal:
- From panel: Choose your source (tap or dataset) with environment and driver information
- To panel: Route data to sinks, taps, or create new datasets
- Rule panel: Apply optional transformation or anonymization rules
- Load options: Fine-tune batch sizes, write modes, and performance settings
- Schedule panel: Choose between immediate execution, one-time scheduling, or recurring pipelines
How to use:
- Click the New Job (+) button in the top-right toolbar
- Follow the step-by-step panels to configure your data flow
- Choose to run immediately (Run Now), schedule for later (Run Later), or save as a reusable pipeline
- Review the collapsed panel summaries to ensure all required fields are configured
For detailed information about all available options and configuration settings, see the complete New Job modal documentation.
Managing existing jobs
- Click any job name to view its details and logs
- Use the Actions menu (⋮) to perform context-appropriate actions like:
Cancel jobs that are queued or running
Sometimes you may need to stop a job that's currently running or waiting in the queue. This is useful when:
- You've started a job by mistake
- You realize you need to make changes to the configuration before proceeding
- The job is taking longer than expected and blocking other operations
- You've identified an issue that makes the job unnecessary
Important: Canceling a job may leave it in an undesired state, so use this action with care. Any data processing that was already completed will remain, but partial operations may need to be cleaned up manually.
Restart jobs that failed
When a job fails, you can restart it from the point of failure rather than starting over completely. This action:
- Skips entities that were already processed successfully
- Retries only the entries that failed or hadn't started yet
- Continues processing where it left off
This is particularly useful for jobs that process large volumes of data where most entities were successful, and you only need to retry the failed ones.
Rerun jobs that completed
This action starts a job over completely from the beginning, reusing all the original job settings. Why would you want to do this?
- Repeat the same operation: Run the exact same scan, discovery, or data processing again with the same parameters
- Refresh data: Get updated results based on the current state of your data sources
- Test consistency: Verify that the job produces the same results when run multiple times
- Apply to new data: If your data source has been updated, rerun the job to process the new information
This is an efficient way to repeat operations without having to reconfigure all the settings.
Delete job history
Remove completed jobs from your job history list. This action:
- Cleans up your job list for better organization
- Removes old jobs that are no longer relevant
- Helps maintain a focused view of recent and active operations
Deleting job history only removes the record from this list - it doesn't affect any data that was processed or created by the job itself.
Download rule configuration as YAML
Export the rule configuration used in a job as a YAML file. This is helpful when:
- You want to share job configurations with team members
- You need to send configuration details to Gigantics support for debugging
- You want to replicate an issue or specific job configuration
- You need to audit or document the rules applied to your data
- You're migrating configurations between environments
Scheduled jobs
Jobs can be scheduled to run automatically in several ways:
| Schedule method | What it does |
|---|---|
| Run now | Execute the job immediately. |
| One-time schedule | Set a specific date and time for the job to run. |
| Manual pipeline | Save job configuration as a reusable pipeline template. |
| Recurring pipeline | Create an automatically repeating job (daily, weekly, etc.). |
Scheduled jobs appear in the Scheduled Jobs tab until they run, making it easy to see what's coming up.
Where jobs originate
Jobs come from various actions in the platform:
- Discover — Scanning your data sources for sensitive information
- Rules — Data anonymization or transformation operations
- Datasets — Data export, copy, or merge operations
- Sinks — Loading processed data to destinations
- Pipelines — Automated sequences of jobs
Whenever you configure one of these operations, if it's scheduled to run in the future, it will appear in your Scheduled Jobs tab.