Skip to main content

Dataset Operators Overview

Overview

Feature Description

In a complete ETL task, input and output nodes ensure that data flows and is processed correctly from source to destination. They are indispensable parts of the ETL lifecycle.

In Guandata Smart ETL, input and output operators are collectively referred to as Dataset Operators, including the Input Dataset operator and the Output Dataset operator, which represent source datasets and result datasets respectively.

They support rapid integration of heterogeneous multi-source data through multiple inputs and allow output from any node in the data flow through multiple outputs.

|400

Usage Limits

  1. Smart ETL requires one or more Input Dataset operators, and at least one Input Dataset operator must exist before an Output Dataset operator can be configured.
  2. Input datasets can come from file data, database datasets excluding direct connection databases and View Datasets, and output datasets from other Smart ETL flows.

Instructions

  1. Drag the Input Dataset operator from the ETL operator panel into the canvas editor on the right, then click the operator to upload the source data.

  2. Drag other operators onto the canvas for data processing and connect them with lines.

  3. After the data processing logic is complete, drag the Output Dataset operator into the canvas editor on the right.

  4. Click the Output Dataset operator, define its name, and choose the storage location.

  5. Click Preview to verify the output result, then save or run the task from the upper-right corner as needed.

    If Save, Run, and Exit is selected, the output dataset is generated automatically after the ETL run succeeds.

Note

Make sure the user has owner permission for both the dataset folder and the ETL folder under the ETL save path. Otherwise, the system reports Invalid Save Path.

Learning Path

You can continue learning from the following pages:

Operator NameDescription
Input DatasetProvides the data foundation for the first stage of ETL, extraction, and prepares for downstream processing. Supports rapid integration of heterogeneous data from multiple sources.
Output DatasetRepresents the result data after ETL processing. It can be used for downstream business analysis and reporting, and supports output from any node.