Smart ETL Overview
1. Overview
1.1. Introduction
This section introduces what Guandata Smart ETL is, explains application scenarios and functional modules, and provides a beginner-friendly quick start tutorial.
For Smart ETL related video courses, see:
1.2. Smart ETL Overview
Smart ETL is a zero-code, fully drag-and-drop self-service data preparation and data warehouse construction tool provided by Guandata for business users. Smart ETL enables users to efficiently process datasets in a user-friendly, low-threshold, and intelligent way before data analysis and visualization. Based on powerful data processing operators and nodes, users can clean, transform, and load data through drag-and-drop and configuration, preview and correct in real time at any node, and output results to build a data processing workflow. This helps enterprises/departments build lightweight data warehouses, allowing business users without SQL knowledge to achieve professional-level data processing results.

Compared with traditional ETL tools, Guandata Smart ETL has higher automation, intelligence, visualization, and security in handling complex data, improving data processing efficiency, data quality, and data consistency, effectively meeting enterprise data processing needs.
· Zero-code visual configuration operation, presenting every step of the business logic process in detail, what you see is what you get. · Smart ETL includes 5 categories and 15+ commonly used operators: input/output, column editing, data editing, data combination, and advanced calculation. · Real-time preview and save during data processing, users can confirm results at any time, correct online, and avoid rework. · For complex data sources, 35+ data access types are provided, and any node in the data flow supports output at any time, fully utilizing data value. · Based on Spark big data architecture, Smart ETL easily handles massive data analysis scenarios, providing speed capabilities for enterprise-level billion-row data processing.
1.3. Application Scenarios
· Data Warehouse Construction:
For large and medium-sized groups with complex information systems and no unified group IT system, storing multi-source heterogeneous data on the same platform achieves group-level data "unification" for efficient centralized management and decision support. For example, extracting financial data from general ledgers, reports, or even voucher levels to build a data warehouse. Data is pre-processed to the warehouse before deep mining, which does not affect the operation of the business database and meets the needs of offline data warehouse services.
· Mining Data Value:
Faced with tens of billions of rows of inventory data, many enterprises face the dilemma of "keeping is a burden, discarding is a worry." The most common solution is to save a full snapshot of the data every day, provide a date primary key, and open it for users to query. But this actually saves a lot of unchanged information, which is a huge waste of storage; moreover, poor design can seriously affect query efficiency and drag down the database.
For example, a chain pharmacy enterprise with 3,000 stores and 1,000 SKUs would generate 3 million records per day, 1 billion per year. If 5 years of historical data are required, nearly 5 billion historical snapshots need to be saved. Smart ETL can handle massive historical data compression and query, meeting the need to reflect historical data status while maximizing storage savings and improving query efficiency.
· Data Cleaning and Transformation:
In actual data analysis and decision-making, data often has inconsistencies, duplicates, and missing values, requiring ETL for cleaning and transformation. ETL processing produces high-quality, consistent data, providing a reliable foundation for subsequent analysis and decision-making.
1.4. Function Introduction
The Smart ETL editing interface is divided into 5 main areas: ETL Operator Area, Canvas Editing Area, Data Preview Area, etc.

Descriptions of different operation areas are as follows:
Operation Area | Description |
---|---|
ETL Operator Area | This area contains a series of predefined ETL operators, including datasets, column editing, data editing, dataset combination, etc., covering all aspects of data cleaning, transformation, and loading. By selecting appropriate operators in the ETL Operator Area and dragging them to the Canvas Editing Area, users can build a complete ETL process and define each step of data processing. |
Canvas Editing Area | This area is where users design and configure the actual ETL process. Users can drag and connect different ETL operators on the canvas, set connections and parameters, and intuitively define the data processing flow. |
Data Preview Area | This area is used to preview data. After configuring the ETL process, users can preview the data effect at each node in real time to confirm the correctness of the process. |
Update Settings Area | This area is used to further configure ETL job scheduling strategies, supporting update methods (scheduled update, update after dataset update), task priority, timeout limits, etc. |
Undo/Redo Area | Users can use the undo and redo buttons to go back or forward to a specific operation state (only the last 30 steps are recorded), making it easy to modify and adjust during design and improving fault tolerance. |
2. Getting Started
To help you systematically master data processing skills, we have sorted out the following learning path. You can refer to the entry-level practical case below to complete your first ETL task. The specific learning path is as follows:
Core Path | Operation Guide | Description |
Create ETL Task | An entry-level practical case | |
Use ETL Operators | Includes input and output dataset operators, representing raw and result datasets respectively Supports rapid integration of multi-source heterogeneous data (multi-input), and output at any node in the data flow (multi-output) | |
Expand or merge calculations on existing columns in the dataset, such as multi-column merge calculation | ||
Remove dirty data from the source or replace certain data values, etc. | ||
Join common columns from two datasets to merge into a more comprehensive dataset | ||
Query, extract, and merge data, output dataset statistics at once, etc. SQL input in this function is a free operator, others are paid modules. Contact sales for trial. | ||
Canvas Editing Operators | Helps collaborators understand complex ETL steps, reducing maintenance and handover costs | |
Helps users quickly clarify upstream and downstream node relationships when troubleshooting ETL issues | ||
Allows users to go back or forward to a specific operation state using undo and redo buttons | ||
Management & Maintenance | View ETL task details, including but not limited to run records, last modified time, last run time, run duration, etc., for later review | |
Edit ETL tasks as needed, such as adding/removing operator nodes, changing processing logic, etc. | ||
Control ETL task start/run (start time, run cycle, trigger conditions) through ETL task scheduling | ||
Users can clean up ETL tasks that are no longer needed | ||
Set resource permissions for ETL tasks, including task owners and visitors | ||
Support fine-grained operation parameters for a single ETL task to ensure normal and efficient operation. For example, enabling ETL intermediate result cache can significantly improve task efficiency |
3. FAQ
If you encounter problems when using Smart ETL, it is recommended to check [ETL FAQ](../../11-FAQ/1-Data Processing/2-ETL FAQ.md) and ETL Common Errors.
For more help on using Smart ETL, visit the Guandata Video Tutorial Website.