Dataset Update

Scenario 1: Long Runtime for Dataset Update Tasks

Cause of the Problem

It is recommended to check the metric "Top 20 Extracted Datasets by Runtime in the Last 30 Days (Runtime ≥10s)". If a single dataset update task takes more than 60 minutes, it means the dataset update task is running for too long. Possible causes include:

(1) Poor network environment

(2) Large dataset

(3) Many queued tasks

(4) Database performance issues

Troubleshooting Ideas

We recommend troubleshooting step by step as follows:

(1) Compare with Guandata's reference data to see if the runtime in your environment is reasonable

Optimization Measures

You may consider optimizing by increasing network bandwidth or improving database performance.

Guandata official update efficiency information (for reference only):

Network environment: 50M

Dataset: 2 million rows/40 columns, runtime: 2min

Dataset: 10 million rows/40 columns, runtime: 10min

(2) For large datasets (rows >50 million/40 columns), unless necessary, it is recommended not to use full update. Adjust to incremental update as needed

Optimization Measures

Click the dataset name to jump, and configure incremental update under the data update function as needed.

(3) Refer to the metrics "Node CPU Usage Trend Chart" and "Server CPU Load (System Load) Trend Chart" to understand CPU usage peaks and CPU load peaks

Optimization Measures

For large dataset update tasks, if possible, avoid running them during the above peak periods.

Other Suggestions

If you need to monitor dataset update tasks in real time, it is recommended to set up failure and timeout task alert mechanisms in "Admin Settings - Operation & Maintenance Management - Information Notification".

Scenario 1: Long Runtime for Dataset Update Tasks​

Cause of the Problem​

(1) Poor network environment​

(2) Large dataset​

(3) Many queued tasks​

(4) Database performance issues​

Troubleshooting Ideas​

(1) Compare with Guandata's reference data to see if the runtime in your environment is reasonable​

(2) For large datasets (rows >50 million/40 columns), unless necessary, it is recommended not to use full update. Adjust to incremental update as needed​

(3) Refer to the metrics "Node CPU Usage Trend Chart" and "Server CPU Load (System Load) Trend Chart" to understand CPU usage peaks and CPU load peaks​

Other Suggestions​