Skip to main content

ETL Refresh Policy

Overview

An appropriate scheduling strategy can significantly improve work efficiency, ensure data quality, and provide strong support for data teams.

Through ETL task scheduling, you can precisely control how ETL tasks start and run, including start time, run cycle, and trigger conditions, enabling automated processing of large volumes of enterprise data and making data processing more efficient and reliable.

You can define different scheduling strategies based on actual business needs, including ETL refresh methods, task priorities, and timeout settings, to meet a wide range of data processing scenarios.

Entry Points

Entry 1: ETL Task Details Page > ETL Refresh

Entry 2: ETL Task Editing Page > Refresh Settings

Configure Scheduling Status

You can control how ETL tasks run through the scheduling status switch. When scheduling is enabled, ETL tasks support scheduled runs and can be triggered automatically after dataset updates. When scheduling is disabled, ETL tasks can only be triggered manually or through API calls. This flexible setup is especially suitable for scenarios with smaller data volumes or relatively simple tasks, because you can adjust execution timing and frequency at any time based on actual needs.

Configure Refresh Methods

Guandata BI supports two refresh methods: scheduled refresh and refresh triggered after selected datasets are updated.

Scheduled execution is suitable for data processing tasks that run at fixed intervals. Refresh triggered by dataset updates is more suitable for real-time or near-real-time scenarios. For example, analyzing business performance becomes meaningful only after sales data has been updated.

Compared with scheduled execution, refresh triggered by dataset updates avoids running tasks when no actual data has changed, reducing resource waste. For example, it prevents ETLs from running with incomplete source data and producing Output Datasets with no practical value.

  • Scheduled Refresh Based on business requirements and data update frequency, you can configure ETL tasks to run automatically at preset times or intervals. The system supports multiple recurring modes, including daily, weekly, or monthly execution. You can also define multiple time points within a cycle, such as 00:00, 06:00, 08:00, 12:00, and 24:00 each day.

  • Triggered After Dataset Updates When datasets in an ETL flow change, the system can automatically trigger the corresponding ETL task to ensure timeliness and accuracy. You can choose between the following two trigger modes:

    • Trigger after any selected dataset is updated: The ETL task is triggered immediately as soon as any selected dataset is updated.
    • Trigger after all selected datasets are updated: The ETL task is triggered only after all selected datasets have been updated.

    Please note that all datasets must be updated within the same day (00:00:00-23:59:59). Select the required datasets as needed.

Notes
  • URL-triggered refresh is not supported for ETL refresh methods. If this capability is required, customers can add an unrelated dataset A to the ETL source datasets, configure the ETL to refresh when dataset A is updated, and then configure dataset A to use URL-triggered refresh. Updating dataset A will then trigger the ETL to run.
  • When a Knowledge Feedback dataset is used as an Input Dataset for ETL, that dataset cannot be selected if the trigger condition is Trigger after any selected dataset is updated. The checkbox is disabled. This is because allowing feedback datasets to trigger ETLs could create performance issues due to excessive trigger frequency.

The ETL refresh method is set to "Trigger only after all selected datasets are updated", so why was the ETL triggered automatically before all datasets were updated?

Configure Task Priority

ETL task execution has concurrency limits, meaning only a limited number of tasks can run at the same time. When the limit is exceeded, tasks enter a queue. Based on business urgency and task importance, you can configure the run priority for the current ETL task. The system currently provides five priority levels: Highest, High, Medium, Low, and Lowest.

Tasks set to Highest priority are executed before all other ETL tasks. If multiple tasks are set to Highest, they are executed in the order they were submitted.

The same ETL can have at most one running task and one queued task. Multiple tasks for the same ETL are never allowed to run or queue simultaneously. If the same ETL is already queued, a newly submitted task is treated as a duplicate and canceled directly.

Notes

The number of concurrent ETL tasks can be configured by an administrator on Admin Center > System Settings > General Settings > Runtime Parameters. Click Edit in the upper-right corner to adjust Parallel Scheduled Tasks based on the available environment resources.

Configure Timeout Settings

You can control task execution time by setting a runtime threshold for ETL tasks. If a task cannot be completed within the specified time, the system automatically terminates it to prevent long occupation of task queues or compute resources and ensure that other tasks can continue to run normally. As the ETL owner, you can choose a custom timeout or follow the global system setting.

|400

  • Custom: The ETL owner can set a timeout between 1 and 300 minutes.

    |450

  • Follow Global Setting: Administrators can configure a unified Maximum Task Runtime on Admin Center > System Settings > General Settings > Runtime Parameters.