Drivers

Salesforce Driver Documentation

This documentation explains the Salesforce driver implementation and how connection parameters map to UI fields in the Gigantic application.

Connection Parameters

UI FieldTechnical ImplementationDescriptionRequiredDefault Value
Hosthost in connection configSalesforce login URLYeshttps://login.salesforce.com
Usernameuser in connection configSalesforce username (email format)YesNone (must be specified)
Consumer KeyconsumerKey in connection configConnected App Consumer Key from SalesforceYesNone (must be specified)
Private KeyprivateKey in connection configRSA private key for JWT authentication (PEM format or file path)YesNone (must be specified)

Authentication: JWT OAuth 2.0

The Salesforce driver uses JWT OAuth 2.0 Bearer Flow for server-to-server authentication. This method is secure and doesn't require user interaction.

Setting Up JWT Authentication

Step 1: Create a Connected App in Salesforce

  1. Log in to your Salesforce org
  2. Navigate to: Setup → Apps → App Manager
  3. Click New Connected App
  4. Fill in basic information:
    • Connected App Name: Gigantics Data Migration
    • Contact Email: Your email
  5. Enable OAuth Settings:
    • Check "Enable OAuth Settings"
    • Callback URL: https://login.salesforce.com (not used for JWT but required)
    • Check "Use digital signatures"
    • Upload your certificate (public key from the key pair you'll generate)
    • Select OAuth Scopes:
      • Full access (full)
      • Perform requests at any time (refresh_token, offline_access)
  6. Save and note the Consumer Key

Step 2: Generate RSA Key Pair

Generate a private key and certificate:

# Generate private key
openssl genrsa -out server.key 2048

# Generate certificate (public key)
openssl req -new -x509 -key server.key -out server.crt -days 365

Upload server.crt to the Connected App (Step 1). Use server.key in Gigantics configuration.

Step 3: Pre-Authorize User

  1. Go to: Setup → Apps → Connected Apps → Manage Connected Apps
  2. Click your app name
  3. Click Edit Policies
  4. Under OAuth Policies:
    • Permitted Users: Admin approved users are pre-authorized
  5. Click Save
  6. Click Manage Profiles or Manage Permission Sets
  7. Add your user's profile or permission set

Technical Details

The Salesforce driver uses Bulk API 2.0 for high-performance data operations:

  • Queries: Automatically uses Bulk API 2.0 for datasets over 10,000 records, REST API for smaller queries
  • Writes: Uses Bulk API 2.0 for all insert, update, and upsert operations
  • Batch Sizes: Supports up to 150,000 records per batch
  • Concurrent Processing: Supports up to 100 parallel jobs
  • Automatic Ordering: Parent objects are processed before children

Load Modes

The Salesforce driver supports three load modes:

ModeDescriptionUse CaseExternal ID Required
InsertDeletes all existing records then creates new ones (truncate operation)Initial data load, full refreshNo
UpdateUpdates existing records by Salesforce IDUpdating records in same orgNo
MergeUpserts using external ID fieldCross-org migrations, re-runnable loadsYes

External ID for Merge Operations

When using merge mode, the driver automatically creates an external ID field called GIGExternalId__c on each object:

  • Field Type: Text (255 characters)
  • Purpose: Maps source record IDs to target org
  • Unique: Marked as external ID for upsert matching
  • Automatic: Created during the start phase if it doesn't exist

How it works:

  1. Source record ID is stored in GIGExternalId__c
  2. On upsert, Salesforce matches by GIGExternalId__c
  3. If match found → Update; If not found → Insert

Example:

rules:
  migration:
    loadMode: merge  # Enables external ID usage
    include:
      - Account
      - Contact

Best Practices

Always use Gigantics with Salesforce sandboxes, not production:

  • Why: Data migrations can affect automation, validation rules, and business logic
  • Sandbox Types: Full Copy, Partial Copy, or Developer sandboxes
  • Testing: Validate migration in sandbox before considering production
  • Sandbox Login: Use https://test.salesforce.com as host

For the best performance and simplest setup, use update mode within the same org:

Benefits:

  • Fast: No ID mapping required, true concurrent processing
  • 🎯 Simple: No user mapping, no external ID creation
  • Reliable: Direct ID references, no cross-org complexity

Example Use Case:

# Anonymizing data in the same sandbox
source:
  driver: salesforce
  host: https://test.salesforce.com
  # ... auth details

sink:
  driver: salesforce
  host: https://test.salesforce.com  # Same org!
  # ... same auth details

rules:
  anonymize:
    loadMode: update  # Update records in place
    transform:
      Contact:
        - Email:
            action: fake
            label: tech/email

Update vs Merge: When to Use Each

ScenarioRecommended ModeWhy
Same org anonymizationUpdateFastest, no ID mapping
Same org data refreshUpdateDirect ID references
Cross-org migrationMergeExternal ID handles different IDs
Re-runnable loadsMergeIdempotent, safe to re-run
Full data replacementInsertTruncates all data and reloads

Automation Management

⚠️ Important: Salesforce automations can slow down or block data migrations.

Before running large migrations, consider disabling automations:

  • Validation Rules: May block records
  • Workflow Rules: Can slow down operations
  • Process Builder: Executes on every record
  • Flows: Auto-launched flows run during loads
  • Duplicate Rules: Can block upserts (see below)
  • Apex Triggers: Consume processing time

How to disable:

  1. Navigate to: Setup → Process Automation
  2. Deactivate relevant workflows, processes, and flows
  3. Navigate to: Setup → Object Manager → [Object] → Validation Rules
  4. Deactivate validation rules temporarily
  5. Run migration
  6. Re-enable after completion

Duplicate Rules

Salesforce duplicate rules can block merge operations even when using external IDs:

Problem: Duplicate rules check fields like Email, Name, Phone - not your external ID. During upsert, if the external ID doesn't match but other fields do, the insert fails.

Solution: Disable duplicate rules during migration

Setup → Duplicate Management → Duplicate Rules → Deactivate

SSH Tunnel Support

The Salesforce driver does not require SSH tunneling as it connects directly to Salesforce cloud services via HTTPS. All connections are encrypted by default.

API Endpoints Used

Salesforce connections are primarily used in:

  • Tap creation (schema discovery and data extraction)
  • Sink creation (data loading destination)
  • Pipeline execution (data anonymization and migration)

Custom Params

The Salesforce driver supports additional custom parameters for advanced configurations.

Example:

bulkPollInterval: 5000        # Polling interval in milliseconds (default: 5000)
bulkPollTimeout: 1800000      # Maximum timeout in milliseconds (default: 1800000 - 30 minutes)
bulkQueryThreshold: 10000     # Record count threshold to use Bulk API 2.0 (default: 10000)

Parameter Descriptions:

  • bulkPollInterval: How often to check job status in milliseconds. Lower values provide faster updates but consume more API calls. Default is 5000ms (5 seconds).

  • bulkPollTimeout: Maximum time to wait for a job to complete in milliseconds. Jobs exceeding this timeout will be cancelled. Default is 1800000ms (30 minutes).

  • bulkQueryThreshold: Minimum number of records to trigger Bulk API 2.0 for queries. Queries below this threshold use REST API for better performance. Default is 10000 records.

Troubleshooting

Authentication Errors

Issue: "invalid_grant: user hasn't approved this consumer"

Solution:

  1. Verify the Connected App is configured correctly
  2. Pre-authorize the user (Manage Profiles/Permission Sets)
  3. Check that the certificate matches the private key
  4. Ensure username is correct

Issue: "invalid_grant: invalid client credentials"

Solution:

  • Verify Consumer Key is correct
  • Check that private key file exists and is readable
  • Ensure certificate was uploaded to Connected App

Duplicate Detection Errors

Issue: DUPLICATES_DETECTED: Use one of these records

Solution: Disable duplicate rules before migration (see Automation Management section above)

Performance Issues

Issue: Migration is slow

Solutions:

  1. Disable automations (biggest impact)
  2. Use update mode instead of merge (if same org)
  3. Verify automation rules are deactivated

Connection Limits

Issue: API limit errors

Solution:

  • Bulk API 2.0 counts only job creation against API limits (not each batch)
  • Consider daily API limit allocation

Summary

The Salesforce driver is optimized for high-performance data migrations using Bulk API 2.0:

Use sandboxes for testing and migrations

Update mode for same-org operations (fastest)

Merge mode for cross-org migrations

Disable automations before large migrations

JWT OAuth for secure authentication

Future Features

The following features are planned for future releases of the Salesforce driver:

FeatureDescriptionStatus
Sandbox Auto-DetectionAutomatically detect sandbox vs production from the login URLPlanned
Field-Level Security AwarenessRespect FLS and surface permission warnings during schema discoveryPlanned
Multi-Org SupportConnect to multiple Salesforce orgs within a single projectPlanned
Password Manager IntegrationRetrieve Salesforce credentials from external password managersPlanned