ETL FAQ
Overview
This article introduces common Smart ETL issues in the Guandata BI data preparation module.
Smart ETL
Why is the model structure of an ETL output dataset empty?
Issue Description:
Other possible symptoms:
The Model Structure and Association Creation sections on the dataset overview page are empty.
The ETL that the dataset belongs to cannot be found, and the dataset cannot be updated.
Cause:
After the ETL was created and run successfully, the owner deleted the output dataset node while modifying the ETL, then added another output dataset node and saved and ran it. Even if the new output dataset has the same name as the original dataset, which can only be saved in a different folder, it is not the same dataset.
Solution:
(1) Avoid deleting output nodes in ETL. If you need to add a new node before the output node, click the connection line before the last node and delete the connection line instead of deleting the output node.
(2) If the output node has already been deleted and the new output node is set to save in the same folder with the exact same name, the system prompts that a dataset with the same name already exists in this directory and asks you to rename it. At this point, do not force-save the ETL. Cancel saving and exit, then re-enter the editing page and follow suggestion 1.
(3) If the node has already been deleted and many changes have been made that you do not want to discard, you can only save it with a different name or path. The original output dataset is then disconnected from the ETL, and its model structure is empty. In this case, batch switch cards that depend on the old dataset to the new dataset, then delete the old dataset.
Use of Double Quotation Marks in ETL SQL Nodes
Cause:
If fields in an SQL node query statement use double quotation marks, they are treated as strings. Remove the double quotation marks or change them to backticks.
Solution:
Spark syntax is consistent with MySQL. Backticks are used to prevent custom identifiers such as field names and table names from conflicting with database SQL keywords such as FROM and CREATE. Generally, table names and field names use backticks, but they can also be omitted.
When does an ETL input dataset preview show no data?
Cause:
In ETL, row and column permissions for input datasets are judged based on the ETL owner. Even when an administrator runs the ETL, permissions are judged according to the ETL owner. In most cases, the owner has permission for the input dataset, but row and column permissions may have changed after the input dataset was added, or the ETL owner may have been transferred.
Why are some null values not replaced after using null value replacement in ETL?
Cause 1:
The field value is an empty string, but null value replacement was used by mistake.
Solution:
Use the Value Replacement node to replace empty strings with null.
Cause 2:
The replacement target value does not match the field type. For example, replacing null values in a date field with 0 does not take effect.
Solution:
Ensure that the null replacement target value is consistent with the field type.
Explanation of ETL Filter Rows Trigger Conditions
- When all filter rules select Equals:
a. Meet all rules: Filters data that meets two or more conditions. For example, selecting province = Zhejiang Province and date = 2017-10-05 filters the only row that meets both conditions.
b. Meet any rule: Filters data that meets one of two or more conditions. For example, selecting province = Zhejiang Province and date = 2017-10-05 filters data where province is Zhejiang Province or date is 2017-10-05.
- When all filter rules select Not Equal:
a. Meet all rules: Filters data that meets Not Equal for two or more conditions. For example, selecting province not equal to Zhejiang Province and date not equal to 2017-10-05 filters data that has no Zhejiang Province and no 2017-10-05 date.
Selecting province = Zhejiang Province alone returns no data.
Selecting date = 2017-10-05 alone returns no data.
b. Meet any rule: Filters data that meets one of two or more conditions. For example, selecting province not equal to Zhejiang Province and date not equal to 2017-10-05 filters data where province is not Zhejiang Province and date is not 2017-10-05. Selecting province = Zhejiang Province alone returns data.
Selecting date = 2017-10-05 alone returns data.
Selecting province = Zhejiang Province and date = 2017-10-05 returns no data.
- Summary
a. When the filter rule is Equals, the logic is easy to understand. Meet all rules means two or more conditions must all be met. Meet any rule means one of the conditions must be met.
b. When the filter rule is Not Equal, if the intended effect is to filter out all data that meets the conditions, such as filtering out data from Zhejiang Province on October 5, 2018 in this case, select Meet any rule as the trigger rule.
Do ETL and datasets have a backup mechanism?
There is no separate backup mechanism. BI overall data has a backup mechanism and is automatically backed up every day at 02:30.
BI data backup includes ETL and dataset content.
Can SQL operators use custom time macros?
Global parameters and dynamic time macros are not supported in ETL. Related dynamic date functions such as now() and current_date() can be used.
Why does an error occur when using sum to aggregate a new field in ETL?
Cause:
If only sum is written, ETL does not know which dimension to aggregate by. In a card, after dragging in a dimension field and then the corresponding numeric field, aggregation is performed automatically based on the dimension field.
Solution:
Use the Group Aggregate node to sum, or use a window function in a new field, such as sum() over(partition by).
Can datasets generated by ETL processing be written back to our database?
BI platform version 5.7 and later supports data writeback. Earlier versions can only implement this through API integration. If the data development platform product is used, data can be written back directly to a specified database on the Universe platform.
Why can I not view ETL details?
Cause:
Normal users who are not ETL owners cannot view ETL details. Only ETL owners or users with administrator permissions can view ETL details.
What is the impact of restarting jobserver and restarting the server on BI task status?
In Service Management, restarting the ETL jobserver can kill all running tasks.
Restarting BI on the server can kill all running and queued tasks.
Which input datasets are supported by ETL?
Supported input datasets include file datasets, database datasets excluding direct-connection databases and view datasets, and output datasets from other Smart ETLs.
In ETL operators, after using the merge columns operator, how can the merged field be split again?
In the merge columns operator, you can choose whether to delete the original fields. It is recommended to keep the original fields, so both the original fields and the merged field can be used. If the original fields were not kept, use the Add Calculated Field operator and use substr and instr to split the merged field by substring extraction.
Can random numbers be fixed in ETL?
If the ETL is not updated, the output dataset can keep random numbers unchanged. To retain a specific result, export the output dataset.
When trying to replace the input dataset of one ETL by clicking from the ETL to the dataset and replacing data, why are input datasets of other ETLs also replaced?
Cause:
Replacing data on the dataset overview page replaces the dataset itself, not only the input dataset of a specific ETL.
Solution:
If you only want to replace the input dataset of a specific ETL, click Edit and replace it inside the ETL.
The ETL update method is set to Trigger only after all selected datasets are updated. Why is the ETL triggered before all datasets are updated?
Cause:
The later option All datasets must be updated on the same day (00:00:00-23:59:59) is not selected. In this case, if the selected datasets are updated once across different days, the trigger condition is also met.
Solution: If you want the ETL to run only after datasets are updated on the same day, select the same-day option.

In SQL node preview, why do some field types change?
The SQL node re-determines field types in the backend and does not directly inherit the types displayed by previous nodes. If field types change, check whether previous node operations are standardized.
Cause 1:
A previous Append Rows node concatenated two fields with the same name but different types, causing the backend to be unable to accurately determine the type. It is displayed as string by default.
Solution:
Before Append Rows, check the field type with the issue and use functions to convert the types to be consistent before appending datasets.
Cause 2:
The field with the issue was generated by a previous node or an upper-level ETL Add Calculated Column operation, and functions were not used in a standardized way, causing the field type itself to differ from the manually selected type in the upper-right corner. In this case, the field type corrected by the SQL node may be the correct one.
Solution:
Check the previous field generation node and use functions to convert field types, or make no change and use the field type output by SQL.
ETL Output Type Does Not Match Expectations
Issue Description:
In SQL node preview, some field types are inconsistent with previous nodes, or after ETL runs, some field types differ from the original types.
Cause:
(1) The ETL contains an Append Rows node that combines fields with the same name but different types from different data sources. If fields with the same name include string and other types, they are displayed as string.
(2) New fields generated by Add Calculated Column do not use functions in a standardized way, causing the field type itself to differ from the manually selected type in the upper-right corner. The SQL node re-determines field types and does not directly inherit the previous displayed type, resulting in some type inconsistencies.
Solution:
(1) Before Append Rows, check the field types with issues as much as possible and use functions to convert the types to be consistent before appending datasets.
(2) When using Add Calculated Column, use functions to convert field types and ensure the numeric type is consistent with the manually selected type.
All scheduled tasks do not run automatically after updating the license
Issue Description:
After the old license expires and the license is updated, all scheduled tasks, such as datasets, ETL, and subscriptions, are not automatically triggered. After environment migration and activation of the new environment license, scheduled tasks become invalid.
Cause:
After the license expires and becomes invalid, BI services stop and all scheduled tasks stop. Even if the license is reactivated, scheduled tasks are not automatically triggered. For migration of BI 5.3.0 and later versions, after activating the new BI environment, guandata-server must be restarted; otherwise scheduled tasks become invalid.
Solution:
After reactivating the license, contact Guandata service personnel to restart the entire BI service.
ETL Updates Twice Consecutively
Issue Description:
For the same ETL, the option to update ETL only after all datasets are updated is selected. Two input datasets are each updated only once, but the ETL is triggered twice and both runs succeed.
Cause:
The two input datasets finish updating almost at the same time with an extremely small time difference, causing the system to determine twice that the ETL run trigger condition is met, so it runs twice.
Solution:
This generally does not affect usage. You can wait for Guandata product optimization, or slightly stagger the update times of input datasets.
ETL Does Not Update After Dataset Update
Issue Description:
The ETL update method is configured as after selected datasets are updated, but the dataset updates while the ETL does not. For example, Trigger after any selected dataset is updated is configured, five datasets are selected, and all five datasets update within one day, but the ETL runs only twice.
Cause:
Check the update times of input datasets and the queue time when ETL updates. If the five input datasets update too close together, tasks queue up. Before the first run ends, new tasks are already triggered and queued. Once multiple queued tasks appear for the same ETL at the same time, later duplicate queued tasks are automatically deduplicated and discarded by the system.
Solution:
We recommend using Trigger only after all selected datasets are updated. If data timeliness is required and Trigger after any selected dataset is updated must be used, ensure that the update times of multiple input datasets are staggered and not clustered. The interval must be greater than the ETL run duration and leave room for queueing.