Drivers

CSV Driver Documentation

This documentation explains the CSV driver implementation and how connection parameters map to UI fields in the Gigantic application.

Connection Parameters

UI FieldTechnical ImplementationDescriptionRequiredDefault Value
Directory Pathdir in csv configThe directory path where CSV files are storedYesCurrent working directory
File Extensionextension in csv configThe file extension for CSV filesNocsv
Headersheaders in csv configWhether the CSV files have headersNotrue
Delimiterdelimiter in csv configCharacter used to separate fields in CSV filesNo, (comma)
Quotedquoted in csv configWhether fields are enclosed in quotesNotrue
Quote Characterquote in csv configCharacter used for quoting fieldsNo" (double quote)

Technical Details

The CSV driver implementation uses Node.js built-in file system operations with the fast-csv library for parsing.

Key technical aspects:

  • Treats each CSV file in a directory as a separate collection/table
  • Uses streams for efficient reading and writing of large CSV files
  • Automatically creates the directory if it doesn't exist
  • Converts CSV rows to JSON objects using header names as keys
  • Handles object serialization by converting objects to JSON strings
  • No authentication required for CSV files (file system access)

Connection process:

  1. The driver takes a directory path as input
  2. It scans the directory for files with the specified extension
  3. Each file becomes a collection that can be read from or written to

Reading process:

  1. Creates a readable stream from the CSV file
  2. Uses fast-csv parser to convert CSV data to arrays
  3. Transforms arrays to JSON objects using headers as keys

Writing process:

  1. Creates a writable stream to a CSV file
  2. Uses fast-csv writer to convert JSON objects back to CSV format
  3. Handles proper quoting and escaping of fields

Authentication Options

Auth TypeUI MappingDescription
NoneOnly option availableNo authentication needed for CSV files

Note: The CSV driver does not require authentication as it works directly with files in the file system.

API Endpoints Used

CSV connections are primarily used in:

  • Tap creation (data source discovery from CSV files)
  • Sink creation (data destination for anonymized data as CSV files)
  • Pipeline execution (data extraction from and loading to CSV files)

Node.js Driver Dependency

The CSV driver depends on the fast-csv Node.js library:

  • Library: fast-csv
  • Purpose: Fast CSV parsing and formatting
  • Features: Streaming API, header handling, quoting, escaping

Custom Params

The CSV driver supports additional custom parameters that can be specified in YAML format to fine-tune CSV parsing, formatting, and file handling behavior.

File and Directory Parameters

# Basic parameters
dir: "/path/to/csv/files"
extension: "csv"

# File Creation Options
createDir: true  # Automatically create directory if it doesn't exist
fileMode: "0644"  # File permissions for created CSV files

Parsing Parameters

# Header Options
headers: true        # Whether CSV files have headers
renameHeaders: false  # Rename headers to avoid duplicates
strictColumnHandling: false  # Strictly enforce consistent column count

# Delimiter and Formatting
delimiter: ","       # Field delimiter character
quote: "\""         # Quote character
escape: "\""        # Escape character for quotes

# Data Handling
trim: true          # Trim whitespace from fields
rtrim: false        # Right trim whitespace from fields
ltrim: false        # Left trim whitespace from fields
ignoreEmpty: true   # Ignore empty rows
discardUnmappedColumns: false  # Discard columns not in provided headers

# Comment Handling
comment: "#"        # Comment character to ignore lines

# Validation
validate: false     # Enable row validation

Formatting Parameters

# Header Options
writeHeaders: true   # Include headers in output files

# Delimiter and Formatting
delimiter: ","       # Field delimiter character
quote: "\""         # Quote character
escape: "\""        # Escape character for quotes
rowDelimiter: "\n"  # Row delimiter character

# Quoting Behavior
quoted: true        # Quote all fields
quotedEmpty: true   # Quote empty fields
quotedString: false  # Quote string fields

# Data Handling
includeEndRowDelimiter: false  # Include delimiter at end of file
writeBOM: false     # Write Byte Order Mark (BOM) for UTF-8
transform: {}       # Transformation function for data

Streaming Parameters

# Stream Options
objectMode: true     # Enable object mode for streams
highWaterMark: 128  # Buffer size for streaming

# Batch Processing
batchSize: 1000     # Number of rows to process in batches

Example Configurations

Basic Configuration

dir: "/data/csv_files"
extension: "csv"
headers: true
delimiter: ","
quote: "\""

Advanced Parsing Configuration

dir: "/data/input"
extension: "txt"
headers: ["id", "name", "email", "age"]
delimiter: "|"
quote: "\""
escape: "\\"
trim: true
ignoreEmpty: true
discardUnmappedColumns: true
comment: "#"
strictColumnHandling: true
highWaterMark: 256
batchSize: 2000

Formatting Configuration

dir: "/data/output"
extension: "csv"
writeHeaders: true
delimiter: ","
quote: "\""
quoted: true
quotedEmpty: true
includeEndRowDelimiter: true
writeBOM: true
rowDelimiter: "\n"
objectMode: true
fileMode: "0644"

Tab-Delimited Configuration

dir: "/data/tsv_files"
extension: "tsv"
headers: true
delimiter: "\t"  # Tab character
quote: "\""
trim: true
ignoreEmpty: true