Output Dataset
1. Overview
1.1. Function Description
Output dataset is the result data after data flow processing, which can be used for subsequent business analysis and report analysis.
Multiple output dataset operators can be configured at any node in the ETL data flow, and different storage locations can be specified for ETL output datasets.
Additionally, users can set acceleration fields for output datasets. The system will perform sharding processing on these datasets according to these fields, thereby improving the speed when these datasets are used for card queries.
1.2. Prerequisites
At least 1 "Input Dataset" or "Database Input" is required in the data flow before "Output Dataset" can be configured.
2. Operation Steps
-
Drag the Output Dataset operator from the data flow operator area into the right canvas editing area, and connect it using connection lines;
-
Select full update/incremental update
- Full update supports creating new datasets/selecting existing datasets. When selecting existing datasets, if dataset fields have changed, check "Auto-update data structure" to automatically overwrite the data structure of existing datasets
- Incremental update supports selecting existing datasets, configuring field mapping relationships. If dataset fields have changed, check "Auto-update data structure" to automatically add missing fields to existing datasets, configure predecessor cleaning rules to clean target dataset data
-
After successfully running the offline development task, the system will automatically output the offline development dataset to "Data Center > Datasets".