Salesforce Driver Documentation
This documentation explains the Salesforce driver implementation and how connection parameters map to UI fields in the Gigantic application.
Connection Parameters
| UI Field | Technical Implementation | Description | Required | Default Value |
|---|---|---|---|---|
| Host | host in connection config | Salesforce login URL | Yes | https://login.salesforce.com |
| Username | user in connection config | Salesforce username (email format) | Yes | None (must be specified) |
| Consumer Key | consumerKey in connection config | Connected App Consumer Key from Salesforce | Yes | None (must be specified) |
| Private Key | privateKey in connection config | RSA private key for JWT authentication (PEM format or file path) | Yes | None (must be specified) |
Authentication: JWT OAuth 2.0
The Salesforce driver uses JWT OAuth 2.0 Bearer Flow for server-to-server authentication. This method is secure and doesn't require user interaction.
Setting Up JWT Authentication
Step 1: Create a Connected App in Salesforce
- Log in to your Salesforce org
- Navigate to: Setup → Apps → App Manager
- Click New Connected App
- Fill in basic information:
- Connected App Name:
Gigantics Data Migration - Contact Email: Your email
- Connected App Name:
- Enable OAuth Settings:
- Check "Enable OAuth Settings"
- Callback URL:
https://login.salesforce.com(not used for JWT but required) - Check "Use digital signatures"
- Upload your certificate (public key from the key pair you'll generate)
- Select OAuth Scopes:
- Full access (full)
- Perform requests at any time (refresh_token, offline_access)
- Save and note the Consumer Key
Step 2: Generate RSA Key Pair
Generate a private key and certificate:
# Generate private key
openssl genrsa -out server.key 2048
# Generate certificate (public key)
openssl req -new -x509 -key server.key -out server.crt -days 365Upload server.crt to the Connected App (Step 1).
Use server.key in Gigantics configuration.
Step 3: Pre-Authorize User
- Go to: Setup → Apps → Connected Apps → Manage Connected Apps
- Click your app name
- Click Edit Policies
- Under OAuth Policies:
- Permitted Users: Admin approved users are pre-authorized
- Click Save
- Click Manage Profiles or Manage Permission Sets
- Add your user's profile or permission set
Technical Details
The Salesforce driver uses Bulk API 2.0 for high-performance data operations:
- Queries: Automatically uses Bulk API 2.0 for datasets over 10,000 records, REST API for smaller queries
- Writes: Uses Bulk API 2.0 for all insert, update, and upsert operations
- Batch Sizes: Supports up to 150,000 records per batch
- Concurrent Processing: Supports up to 100 parallel jobs
- Automatic Ordering: Parent objects are processed before children
Load Modes
The Salesforce driver supports three load modes:
| Mode | Description | Use Case | External ID Required |
|---|---|---|---|
| Insert | Deletes all existing records then creates new ones (truncate operation) | Initial data load, full refresh | No |
| Update | Updates existing records by Salesforce ID | Updating records in same org | No |
| Merge | Upserts using external ID field | Cross-org migrations, re-runnable loads | Yes |
External ID for Merge Operations
When using merge mode, the driver automatically creates an external ID field called GIGExternalId__c on each object:
- Field Type: Text (255 characters)
- Purpose: Maps source record IDs to target org
- Unique: Marked as external ID for upsert matching
- Automatic: Created during the
startphase if it doesn't exist
How it works:
- Source record ID is stored in
GIGExternalId__c - On upsert, Salesforce matches by
GIGExternalId__c - If match found → Update; If not found → Insert
Example:
rules:
migration:
loadMode: merge # Enables external ID usage
include:
- Account
- ContactBest Practices
✅ Recommended: Sandbox Migrations
Always use Gigantics with Salesforce sandboxes, not production:
- Why: Data migrations can affect automation, validation rules, and business logic
- Sandbox Types: Full Copy, Partial Copy, or Developer sandboxes
- Testing: Validate migration in sandbox before considering production
- Sandbox Login: Use
https://test.salesforce.comas host
✅ Recommended: Update Mode on Same Org
For the best performance and simplest setup, use update mode within the same org:
Benefits:
- ⚡ Fast: No ID mapping required, true concurrent processing
- 🎯 Simple: No user mapping, no external ID creation
- ✅ Reliable: Direct ID references, no cross-org complexity
Example Use Case:
# Anonymizing data in the same sandbox
source:
driver: salesforce
host: https://test.salesforce.com
# ... auth details
sink:
driver: salesforce
host: https://test.salesforce.com # Same org!
# ... same auth details
rules:
anonymize:
loadMode: update # Update records in place
transform:
Contact:
- Email:
action: fake
label: tech/emailUpdate vs Merge: When to Use Each
| Scenario | Recommended Mode | Why |
|---|---|---|
| Same org anonymization | Update | Fastest, no ID mapping |
| Same org data refresh | Update | Direct ID references |
| Cross-org migration | Merge | External ID handles different IDs |
| Re-runnable loads | Merge | Idempotent, safe to re-run |
| Full data replacement | Insert | Truncates all data and reloads |
Automation Management
⚠️ Important: Salesforce automations can slow down or block data migrations.
Before running large migrations, consider disabling automations:
- Validation Rules: May block records
- Workflow Rules: Can slow down operations
- Process Builder: Executes on every record
- Flows: Auto-launched flows run during loads
- Duplicate Rules: Can block upserts (see below)
- Apex Triggers: Consume processing time
How to disable:
- Navigate to: Setup → Process Automation
- Deactivate relevant workflows, processes, and flows
- Navigate to: Setup → Object Manager → [Object] → Validation Rules
- Deactivate validation rules temporarily
- Run migration
- Re-enable after completion
Duplicate Rules
Salesforce duplicate rules can block merge operations even when using external IDs:
Problem: Duplicate rules check fields like Email, Name, Phone - not your external ID. During upsert, if the external ID doesn't match but other fields do, the insert fails.
Solution: Disable duplicate rules during migration
Setup → Duplicate Management → Duplicate Rules → DeactivateSSH Tunnel Support
The Salesforce driver does not require SSH tunneling as it connects directly to Salesforce cloud services via HTTPS. All connections are encrypted by default.
API Endpoints Used
Salesforce connections are primarily used in:
- Tap creation (schema discovery and data extraction)
- Sink creation (data loading destination)
- Pipeline execution (data anonymization and migration)
Custom Params
The Salesforce driver supports additional custom parameters for advanced configurations.
Example:
bulkPollInterval: 5000 # Polling interval in milliseconds (default: 5000)
bulkPollTimeout: 1800000 # Maximum timeout in milliseconds (default: 1800000 - 30 minutes)
bulkQueryThreshold: 10000 # Record count threshold to use Bulk API 2.0 (default: 10000)Parameter Descriptions:
-
bulkPollInterval: How often to check job status in milliseconds. Lower values provide faster updates but consume more API calls. Default is 5000ms (5 seconds).
-
bulkPollTimeout: Maximum time to wait for a job to complete in milliseconds. Jobs exceeding this timeout will be cancelled. Default is 1800000ms (30 minutes).
-
bulkQueryThreshold: Minimum number of records to trigger Bulk API 2.0 for queries. Queries below this threshold use REST API for better performance. Default is 10000 records.
Troubleshooting
Authentication Errors
Issue: "invalid_grant: user hasn't approved this consumer"
Solution:
- Verify the Connected App is configured correctly
- Pre-authorize the user (Manage Profiles/Permission Sets)
- Check that the certificate matches the private key
- Ensure username is correct
Issue: "invalid_grant: invalid client credentials"
Solution:
- Verify Consumer Key is correct
- Check that private key file exists and is readable
- Ensure certificate was uploaded to Connected App
Duplicate Detection Errors
Issue: DUPLICATES_DETECTED: Use one of these records
Solution: Disable duplicate rules before migration (see Automation Management section above)
Performance Issues
Issue: Migration is slow
Solutions:
- Disable automations (biggest impact)
- Use update mode instead of merge (if same org)
- Verify automation rules are deactivated
Connection Limits
Issue: API limit errors
Solution:
- Bulk API 2.0 counts only job creation against API limits (not each batch)
- Consider daily API limit allocation
Summary
The Salesforce driver is optimized for high-performance data migrations using Bulk API 2.0:
✅ Use sandboxes for testing and migrations
✅ Update mode for same-org operations (fastest)
✅ Merge mode for cross-org migrations
✅ Disable automations before large migrations
✅ JWT OAuth for secure authentication
Future Features
The following features are planned for future releases of the Salesforce driver:
| Feature | Description | Status |
|---|---|---|
| Sandbox Auto-Detection | Automatically detect sandbox vs production from the login URL | Planned |
| Field-Level Security Awareness | Respect FLS and surface permission warnings during schema discovery | Planned |
| Multi-Org Support | Connect to multiple Salesforce orgs within a single project | Planned |
| Password Manager Integration | Retrieve Salesforce credentials from external password managers | Planned |