Card Loading
Scenario 1: Guan-index Cards Have Long Access Time
Cause
Check the metric 90th Percentile Performance of Guan-Index Cards in the Last 30 Days. If Guan-index card access time is generally greater than 5 seconds, the card access time does not meet expectations and users will clearly feel that cards on the page take time to load. Possible causes include:
(1) The card queries a large data volume or has complex calculation logic, causing the card task to take a long time
(2) Cards are accessed during resource peak periods. If resources are insufficient, card running may be affected
Locating Method
There are three ways to find cards with long access time, meaning access time greater than 5 seconds.
(1) Check 90th Percentile Performance of Guan-Index Cards in the Last 30 Days
Find the point where the daily 90th percentile performance exceeds the 5-second warning line and click to drill down. After drilling down, sort by run duration to find card tasks with long run duration.
(2) Check Average Queue and Run Duration Distribution of Guan-Index Cards in the Last 30 Days
Find dates where the dark area of the run duration distribution is obvious, accounting for more than 20%. Click the bar to drill down, find the time periods with obvious dark areas on that day, and drill down again. After the second drill-down, sort by run duration to find card tasks with long run duration.
(3) Check Top 20 Card Tasks by Run Duration in the Last 30 Days (run duration >= 3s)
Prioritize the card tasks listed in the table to improve the card access experience. In Admin Settings > Operations Management > Task Management, search by Card Name as the operation object to view the card's historical run duration and determine whether the card run duration is truly abnormal.
Troubleshooting Approach
For cards with long access time, we recommend troubleshooting step by step as follows:
(1) Check the data volume queried by the card, for example whether it reaches hundreds of millions of rows. A large query volume increases card run time
Click the operation object located in the previous step to jump to it and view the row and column count of the dataset corresponding to the card.
(2) Check the card calculation logic. If card calculation involves multiple conditional judgments or filters, for example more than five rules, card run time increases
(3) Check whether CPU resources are sufficient. You can determine whether a resource bottleneck has been reached by observing CPU load and CPU usage trend charts
Refer to Node CPU Usage Trend by Time Period and Server CPU Load (System Load) Trend to understand CPU usage peaks and CPU load peaks. Focus on task concurrency and high-resource tasks during peak periods. Drill down in Node CPU Usage Trend by Time Period to view task details, sort by run duration, find the tasks that occupy the most resources, and adjust them.
Optimization Measures
For important card tasks or card tasks that need to run for a long time, consider off-peak access when business is not affected.
During peak periods, determine whether tasks with long CPU usage duration have upstream dependencies. If business is not affected, move tasks without upstream dependencies to off-peak periods. Also minimize manually triggered tasks during peak periods.
If CPU load is high during peak periods (CPU load > 5), system resources are insufficient. To ensure card running efficiency, consider the following options:
a. When there is a single Job-engine, configure resource isolation in Control Tower to ensure important card tasks can run.
b. If budget allows, expand capacity. Contact Guandata to evaluate the specific expansion plan.
Additional Recommendations
You can set card run timeout duration in Admin Settings > System Settings > General Settings > Run Parameters to avoid wasting computing resources.
Scenario 2: Guan-index Cards Have Long Queue Time
Cause
Check the metric Average Queue and Run Duration Distribution of Guan-Index Cards in the Last 30 Days. If card queue time is greater than 10 seconds, the card queue time does not meet expectations and users will clearly feel that cards on the page take time to load. Possible causes include:
(1) There are many scheduled tasks during the card running period, and many tasks run in the same time period, causing task queues
(2) Cards are accessed during resource peak periods. If resources are insufficient, card queues may occur
Troubleshooting Approach
We recommend troubleshooting step by step as follows:
(1) Check Average Queue and Run Duration Distribution of Guan-Index Cards in the Last 30 Days
For dates where the dark area of the queue time distribution is obvious, accounting for more than 20%, click the bar to drill down and find the time periods with obvious dark areas on that day.
(2) Check whether there are relatively many scheduled tasks during periods with severe queueing
Observe Yesterday Dataset Run Time Distribution and Yesterday ETL Run Time Distribution to see whether there are many scheduled tasks during the period located in the previous step, indicated by a yellow line much higher than other periods.
Optimization Measures
If there are many scheduled tasks, distribute scheduled task settings when business is not affected to avoid excessive concentration.
(3) Check whether tasks with long run duration (run duration > 60 minutes) are blocking execution
In Node CPU Usage Trend by Time Period, find the period located in step 1 and drill down. Sort by task run duration and check whether tasks with long run duration, greater than 60 minutes, are blocking other tasks.
Optimization Measures
After finding blocking tasks, optimize them with the following measures:
a. Compare the task's historical run duration to determine whether the run duration is abnormal. To view historical run duration, click the task operation object name to jump to it and view run history (Dataset > View Update History / ETL > View Run Records).
b. After excluding unexpected job factors, determine whether tasks with long CPU usage duration have upstream dependencies. If business is not affected, move tasks without upstream dependencies to off-peak periods.
(4) Check whether memory resources are sufficient
Refer to Node Memory Usage Trend by Time Period to understand memory usage peaks. Focus on task queueing during the peak period.
Optimization Measures
If there are many queued Guan-index cards during the peak period (queue count > 10), contact Guandata to evaluate whether card task concurrency can be adjusted.
If memory usage is high during peak periods (memory usage > 95%), system resources are insufficient. To ensure card running efficiency, consider the following options:
a. When there is a single Job-engine, configure resource isolation in Control Tower to ensure important card tasks can run.
b. If budget allows, expand capacity. Contact Guandata to evaluate the specific expansion plan.
Additional Recommendations
You can configure card queue timeout cancellation duration in Admin Settings > System Settings > General Settings > Run Parameters to prevent these tasks from occupying resources and blocking other tasks.
Scenario 3: Direct-connection Cards Have Long Loading Time
Cause
Check the metric Average Queue and Run Duration Distribution of Direct-connection Cards in the Last 30 Days. If more than 20% of direct-connection card access times exceed 10 seconds, the card access time does not meet expectations and users will clearly feel that cards on the page take time to load. Possible causes include:
(1) Connecting to the database takes too long
(2) Database performance issues cause blocking
Optimization Measures
Troubleshoot and optimize database performance.