Operations

Max

The Max operation allows you to set the maximum number of records that will be written to the sink or inserted into the dataset. This operation helps control the size of your output data.

Overview

The Max operation provides flexible ways to cap dataset size:

  • Limit by a fixed maximum number of rows
  • Limit by a maximum percentage of available records
  • Apply limits to all entities together or to each entity individually
  • Choose which records to keep when applying the maximum (first, last, or random)

Configuration Options

Scope

The scope determines whether the maximum is applied to all entities collectively or to each entity individually:

All entities: Applies the maximum to the entire dataset, regardless of entity types. For example, if you have 10000 customer records and 10000 order records (20000 total), a maximum of 5000 would return 5000 records total from any combination of entities.

By entity: Applies the maximum separately to each entity type. For example, if you have customer and order entities, a maximum of 5000 would return up to 5000 customer records AND up to 5000 order records (10000 total records maximum).

Maximum Type

You can specify how the maximum should be applied:

By number of rows: Specify an exact maximum number of records. For example, maximum of exactly 1000 records.

By percentage: Specify a percentage maximum of the total available records. For example, maximum of 20% of all records. When using percentages, you can also set minimum and maximum row constraints to ensure you get a reasonable number of records even when the percentage of a small dataset might be too few or too many records.

Row Position

Determines which records are selected when applying the maximum:

First records: Selects records from the beginning of the dataset (useful for getting the most recent records when data is sorted chronologically).

Last records: Selects records from the end of the dataset (useful for getting the oldest records when data is sorted chronologically).

Random records: Selects records randomly from the dataset (useful for sampling data).

Percentage Constraints

When using percentage-based maximums, you can set additional constraints:

Min rows: Ensures that even if the percentage of total records is small, you'll get at least this many rows. For example, if you set 5% but want at least 1000 rows, this setting ensures you'll get 1000 rows even if 5% of your dataset is less than that.

Max rows: Ensures that even if the percentage of total records is large, you won't get more than this many rows. For example, if you set 50% but only want a maximum of 50000 rows, this setting caps your output at 50000 rows even if 50% of your dataset would be more.

Examples

Maximum by Number of Rows

To cap your dataset at exactly 1000 records:

  1. Set Scope to "All entities"
  2. Set Maximum Type to "By number of rows"
  3. Enter "1000" in the value field
  4. Choose which records to keep (First, Last, or Random)

Maximum by Percentage

To limit your dataset to 20% of available records:

  1. Set Scope to "All entities"
  2. Set Maximum Type to "By percentage"
  3. Enter "20" in the value field
  4. Choose which records to keep (First, Last, or Random)

Per-Entity Maximum

To cap each entity at 5000 records:

  1. Set Scope to "By entity"
  2. Configure each entity with:
    • Maximum Type: "By number of rows"
    • Value: 5000
    • Position: "First" (or your preferred selection)

This approach is particularly useful when working with related entities where you want to maintain balanced representation across all types while capping the total number of records.

On this page