Skip to main content

Dataset Update

Scenario 1: Dataset Update Tasks Take a Long Time

Cause

Check the metric Top 20 Extracted Datasets by Run Duration in the Last 30 Days (run duration >= 10s). If a single dataset update task takes more than 60 minutes, the dataset update task is taking a long time. Possible causes include:

(1) Poor network environment

(2) Large dataset

(3) Many queued tasks

(4) Database performance issues

Troubleshooting Approach

We recommend troubleshooting step by step as follows:

(1) Compare with Guandata reference data to determine whether the run duration in your environment is reasonable

?Optimization Measures

Consider optimizing by improving network bandwidth and database performance.

Guandata official update efficiency reference is as follows (for reference only):

Network environment: 50M

Dataset size: 2 million rows / 40 columns; run duration: 2 minutes

Dataset size: 10 million rows / 40 columns; run duration: 10 minutes

(2) For large datasets (more than 50 million rows / 40 columns), avoid full updates unless necessary. Determine whether incremental updates can be used based on business needs

?Optimization Measures

Click the dataset name to jump to the dataset, then choose Incremental Update under the data update feature and configure it.

(3) Refer to the Node CPU Usage Trend by Time Period and Server CPU Load (System Load) Trend metrics to understand CPU usage peaks and CPU load peaks

?Optimization Measures

For large dataset update tasks, avoid the above peak periods when business is not affected.

Additional Recommendations

To monitor dataset update tasks in real time, go to Admin Settings > Operations Management > Notifications and configure alert mechanisms for failed and timed-out tasks.