Drivers
CSV Driver Documentation
This documentation explains the CSV driver implementation and how connection parameters map to UI fields in the Gigantic application.
Connection Parameters
| UI Field | Technical Implementation | Description | Required | Default Value |
|---|---|---|---|---|
| Directory Path | dir in csv config | The directory path where CSV files are stored | Yes | Current working directory |
| File Extension | extension in csv config | The file extension for CSV files | No | csv |
| Headers | headers in csv config | Whether the CSV files have headers | No | true |
| Delimiter | delimiter in csv config | Character used to separate fields in CSV files | No | , (comma) |
| Quoted | quoted in csv config | Whether fields are enclosed in quotes | No | true |
| Quote Character | quote in csv config | Character used for quoting fields | No | " (double quote) |
Technical Details
The CSV driver implementation uses Node.js built-in file system operations with the fast-csv library for parsing.
Key technical aspects:
- Treats each CSV file in a directory as a separate collection/table
- Uses streams for efficient reading and writing of large CSV files
- Automatically creates the directory if it doesn't exist
- Converts CSV rows to JSON objects using header names as keys
- Handles object serialization by converting objects to JSON strings
- No authentication required for CSV files (file system access)
Connection process:
- The driver takes a directory path as input
- It scans the directory for files with the specified extension
- Each file becomes a collection that can be read from or written to
Reading process:
- Creates a readable stream from the CSV file
- Uses fast-csv parser to convert CSV data to arrays
- Transforms arrays to JSON objects using headers as keys
Writing process:
- Creates a writable stream to a CSV file
- Uses fast-csv writer to convert JSON objects back to CSV format
- Handles proper quoting and escaping of fields
Authentication Options
| Auth Type | UI Mapping | Description |
|---|---|---|
| None | Only option available | No authentication needed for CSV files |
Note: The CSV driver does not require authentication as it works directly with files in the file system.
API Endpoints Used
CSV connections are primarily used in:
- Tap creation (data source discovery from CSV files)
- Sink creation (data destination for anonymized data as CSV files)
- Pipeline execution (data extraction from and loading to CSV files)
Node.js Driver Dependency
The CSV driver depends on the fast-csv Node.js library:
- Library: fast-csv
- Purpose: Fast CSV parsing and formatting
- Features: Streaming API, header handling, quoting, escaping
Custom Params
The CSV driver supports additional custom parameters that can be specified in YAML format to fine-tune CSV parsing, formatting, and file handling behavior.
File and Directory Parameters
# Basic parameters
dir: "/path/to/csv/files"
extension: "csv"
# File Creation Options
createDir: true # Automatically create directory if it doesn't exist
fileMode: "0644" # File permissions for created CSV filesParsing Parameters
# Header Options
headers: true # Whether CSV files have headers
renameHeaders: false # Rename headers to avoid duplicates
strictColumnHandling: false # Strictly enforce consistent column count
# Delimiter and Formatting
delimiter: "," # Field delimiter character
quote: "\"" # Quote character
escape: "\"" # Escape character for quotes
# Data Handling
trim: true # Trim whitespace from fields
rtrim: false # Right trim whitespace from fields
ltrim: false # Left trim whitespace from fields
ignoreEmpty: true # Ignore empty rows
discardUnmappedColumns: false # Discard columns not in provided headers
# Comment Handling
comment: "#" # Comment character to ignore lines
# Validation
validate: false # Enable row validationFormatting Parameters
# Header Options
writeHeaders: true # Include headers in output files
# Delimiter and Formatting
delimiter: "," # Field delimiter character
quote: "\"" # Quote character
escape: "\"" # Escape character for quotes
rowDelimiter: "\n" # Row delimiter character
# Quoting Behavior
quoted: true # Quote all fields
quotedEmpty: true # Quote empty fields
quotedString: false # Quote string fields
# Data Handling
includeEndRowDelimiter: false # Include delimiter at end of file
writeBOM: false # Write Byte Order Mark (BOM) for UTF-8
transform: {} # Transformation function for dataStreaming Parameters
# Stream Options
objectMode: true # Enable object mode for streams
highWaterMark: 128 # Buffer size for streaming
# Batch Processing
batchSize: 1000 # Number of rows to process in batchesExample Configurations
Basic Configuration
dir: "/data/csv_files"
extension: "csv"
headers: true
delimiter: ","
quote: "\""Advanced Parsing Configuration
dir: "/data/input"
extension: "txt"
headers: ["id", "name", "email", "age"]
delimiter: "|"
quote: "\""
escape: "\\"
trim: true
ignoreEmpty: true
discardUnmappedColumns: true
comment: "#"
strictColumnHandling: true
highWaterMark: 256
batchSize: 2000Formatting Configuration
dir: "/data/output"
extension: "csv"
writeHeaders: true
delimiter: ","
quote: "\""
quoted: true
quotedEmpty: true
includeEndRowDelimiter: true
writeBOM: true
rowDelimiter: "\n"
objectMode: true
fileMode: "0644"Tab-Delimited Configuration
dir: "/data/tsv_files"
extension: "tsv"
headers: true
delimiter: "\t" # Tab character
quote: "\""
trim: true
ignoreEmpty: true