Practical Approaches to ETL Governance
Overview
This article introduces ETL governance practices and is intended to help users diagnose and optimize ETL queue delays.
It covers the importance of ETL scheduling in data processing workflows, as well as how to effectively manage scheduling through monitoring, alerts, and issue localization to ensure the timeliness and accuracy of business data.
Use Cases
This article is useful in the following scenarios:
- ETL output timing becomes abnormal, such as when ETL performance is noticeably slower than usual on a given day.
- ETL queuing becomes severe, such as when there was previously no ETL queuing and now there are many queued tasks.
- Resource control and capacity optimization scenarios.
Practical Example
In large-scale data processing workflows, the stability and efficiency of ETL scheduling are critical to ensuring the timeliness and accuracy of business data. Recently, significant delays in ETL output time and a sharp increase in the number of queued tasks were observed. These issues may seriously affect data analysis and decision support capabilities.
This article explains how to use existing Guandata BI features such as Smart Cloud Inspection, Task Management, O&M Management, and global controls in Admin Center to effectively monitor ETL scheduling, generate alerts, and locate issues.
Example Steps
Determine Whether There Is a Service Exception
Approach: When ETL queuing is severe, there may be multiple causes, but you can first check whether the Spark service is running normally.
Entry: Go to Admin Center > O&M Management > Service Management and check whether there are restart records.
If Spark restart records exist, the queue issue may be caused by a service exception. In this case, it is recommended to contact Guandata O&M or technical support immediately to investigate the cause of the Spark restart and prevent similar issues from happening again, which in turn helps avoid ETL scheduling blockage.

Determine Whether Business Growth Is the Cause
-
Check the Cloud Inspection Report: First, check the cloud inspection report and see whether the number of ETL schedules has increased significantly compared with before.

-
Check Whether Scheduling Is Reasonable: If the number of scheduled ETL tasks has increased greatly, first evaluate whether the schedules are reasonable.
In the cloud inspection report, locate
Top 20 ETLs by Number of Runs in the Last 31 Daysand check whether there are ETLs with unusually high scheduling frequency that do not match business logic.Relevant adjustments should be made. Excessive scheduling not only consumes ETL scheduled-refresh concurrency, but may also occupy too much CPU. Reasonable scheduling is therefore critical to BI stability.

-
Handle ETL Scheduling Growth Caused by Normal Business Growth: If the growth in ETL scheduling is caused by normal business growth and CPU usage remains at a high level, such as above 80%, this may indicate that available resources are insufficient to support the growth. In this case, contact Guandata O&M or technical support to evaluate capacity expansion and ensure sufficient resources are available.

Optimization Directions
Optimize ETL Tasks with High CPU Consumption
Use the cloud inspection report to view Top 20 ETLs by CPU Usage Duration in the Last 31 Days and check whether any task has consumed CPU for more than one hour.

For ETL tasks that consume large amounts of CPU, refer to ETL Optimization Recommendations to reduce unnecessary resource consumption and avoid long-term resource occupation that blocks ETL scheduling.
Distribute ETL Scheduling Time More Evenly
Method 1: Use ETL Runtime Distribution in the Last 31 Days in the cloud inspection report to check whether too many ETLs run during a specific time period. If business impact allows, try to distribute runs across more time periods or move them to idle nighttime windows to avoid congestion caused by too many tasks running at the same time.

Method 2: Some customers use Spark resource isolation, where Cards and ETLs are deployed with separate job engines. If ETL queues grow because ETL schedules are concentrated overnight:
If there is no Dashboard viewing demand overnight, it is recommended to work with Guandata technical support or O&M to configure dynamic job engine switching. This allows the Card job engine to be temporarily used by ETL at night, giving ETL more compute resources, improving resource utilization, and reducing ETL queuing.
To check whether a dual-job deployment is in use, go to Admin Center > O&M Management > Service Management and review General Compute Engine Service and ETL Compute Engine Service.

Clean Up Invalid ETL Resources
Use Datasets Without Any Downstream Consumption under the ETL section of cloud inspection and review ETL datasets whose type is Data Flow. Check whether the related ETL schedule has other Output Datasets and whether those other Output Datasets are used downstream. If none are used, it is recommended to clean up the related ETL schedule to avoid wasting resources.

Restrict Global ETL Parameters
Entry: Admin Center > System Settings > General Settings > Runtime Parameters
-
You can limit the maximum task runtime. For example:
- Set a maximum runtime for tasks to prevent abnormal ETL scheduling from blocking for too long.
- Or define runtime limits by time period, such as no more than 30 minutes during daytime and no more than 1 hour during other periods, for global control.

-
Adjust the parallel task count. This setting mainly applies to ETL tasks with scheduled refresh and next-level chain refresh. It should generally be configured according to machine specifications and ETL workload. For example, with a
16C128Gsetup, if many ETL tasks run in under 10 minutes, the concurrency can be set to4-6. If most ETL tasks take more than 30 minutes, it is recommended to keep concurrency within4.
NoteThe above information is for reference only. For detailed tuning, it is recommended to contact Guandata technical support or O&M for evaluation before making adjustments. Excessive concurrency may reduce ETL efficiency because simultaneously running tasks compete for CPU resources.
-
Configure the remaining parameters based on your own needs, but avoid setting them too high. Complex ETLs can be very resource-intensive, so try to minimize the number of overly complex ETLs. For ETLs with complex logic, consider splitting them into multiple ETLs and combining them with Advanced Scheduling for sequential scheduling.

Improve ETL Scheduling Efficiency with Advanced Scheduling
For ETL tasks where processed data grows by date and historical data does not change, Advanced Scheduling can be used for incremental refresh to improve efficiency and reduce resource consumption.
For more ETL practice content, see ETL Developer Best Practices.