Include/Exclude
The Include/Exclude operation allows you to select which entities to include in or exclude from your new dataset. This operation is applied at the beginning of the pipeline to determine the initial set of entities to process.
Overview
The Include/Exclude operation provides two ways to filter entities:
- Include mode: Only the selected entities are processed
- Exclude mode: All entities except the selected ones are processed
Additionally, you can use regular expressions to match entity names when you have a naming pattern.
Configuration
Selection Mode
Choose between two modes for entity filtering:
Include: Only the entities you select will be included in your dataset. This is useful when you only need a few specific entities from a large dataset.
Exclude: All entities except those you select will be included in your dataset. This is useful when you want to omit just a few entities from a dataset where most entities are relevant.
Entity Selection
Two methods are available for selecting entities:
Selector: Choose entities from a list of available entities in your dataset. You can select multiple entities by checking them in the list.
Regex: Define a regular expression pattern to match entity names. This is particularly useful when your entities follow naming conventions. For example, if you have entities named "customer_2023", "customer_2024", "order_2023", "order_2024", you could use the regex "customer_.*" to include all customer entities.
Examples
Include Specific Entities
To include only customer and order entities in your dataset:
- Select "Include" mode
- Choose "Selector" method
- Check "customer" and "order" entities from the list
Exclude Specific Entities
To exclude log and audit entities from your dataset:
- Select "Exclude" mode
- Choose "Selector" method
- Check "log" and "audit" entities from the list
Include by Pattern
To include all entities that start with "sales_":
- Select "Include" mode
- Choose "Regex" method
- Enter regex pattern: "sales_.*"
This operation allows you to efficiently control the scope of your data processing pipeline by including only relevant entities or excluding irrelevant ones.