Include/Exclude

The Include/Exclude operation allows you to select which entities to include in or exclude from your new dataset. This operation is applied at the beginning of the pipeline to determine the initial set of entities to process.

Overview

The Include/Exclude operation provides two ways to filter entities:

Include mode: Only the selected entities are processed
Exclude mode: All entities except the selected ones are processed

Additionally, you can use regular expressions to match entity names when you have a naming pattern.

Configuration

Selection Mode

Choose between two modes for entity filtering:

Include: Only the entities you select will be included in your dataset. This is useful when you only need a few specific entities from a large dataset.

Exclude: All entities except those you select will be included in your dataset. This is useful when you want to omit just a few entities from a dataset where most entities are relevant.

Entity Selection

Two methods are available for selecting entities:

Selector: Choose entities from a list of available entities in your dataset. You can select multiple entities by checking them in the list.

Regex: Define a regular expression pattern to match entity names. This is particularly useful when your entities follow naming conventions. For example, if you have entities named "customer_2023", "customer_2024", "order_2023", "order_2024", you could use the regex "customer_.*" to include all customer entities.

Examples

Include Specific Entities

To include only customer and order entities in your dataset:

Select "Include" mode
Choose "Selector" method
Check "customer" and "order" entities from the list

Exclude Specific Entities

To exclude log and audit entities from your dataset:

Select "Exclude" mode
Choose "Selector" method
Check "log" and "audit" entities from the list

Include by Pattern

To include all entities that start with "sales_":

Select "Include" mode
Choose "Regex" method
Enter regex pattern: "sales_.*"

This operation allows you to efficiently control the scope of your data processing pipeline by including only relevant entities or excluding irrelevant ones.

On this page