Dataset Update
Scenario 1: Long Runtime for Dataset Update Tasks
Cause of the Problem
It is recommended to check the metric "Top 20 Extracted Datasets by Runtime in the Last 30 Days (Runtime ≥10s)". If a single dataset update task takes more than 60 minutes, it means the dataset update task is running for too long. Possible causes include:
(1) Poor network environment
(2) Large dataset
(3) Many queued tasks
(4) Database performance issues
Troubleshooting Ideas
We recommend troubleshooting step by step as follows:
(1) Compare with Guandata's reference data to see if the runtime in your environment is reasonable
Optimization Measures
You may consider optimizing by increasing network bandwidth or improving database performance.
Guandata official update efficiency information (for reference only):
Network environment: 50M
Dataset: 2 million rows/40 columns, runtime: 2min
Dataset: 10 million rows/40 columns, runtime: 10min
(2) For large datasets (rows >50 million/40 columns), unless necessary, it is recommended not to use full update. Adjust to incremental update as needed
Optimization Measures
Click the dataset name to jump, and configure incremental update under the data update function as needed.
(3) Refer to the metrics "Node CPU Usage Trend Chart" and "Server CPU Load (System Load) Trend Chart" to understand CPU usage peaks and CPU load peaks
Optimization Measures
For large dataset update tasks, if possible, avoid running them during the above peak periods.
Other Suggestions
If you need to monitor dataset update tasks in real time, it is recommended to set up failure and timeout task alert mechanisms in "Admin Settings - Operation & Maintenance Management - Information Notification".