Card Loading
Scenario 1: Long Access Time for Guan-index Cards
Cause of the Problem
It is recommended to check the metric "9th Percentile Performance of Guan-Index Cards in the Last 30 Days". If the overall access time for Guan-index cards is >5 seconds, it indicates that the card access time does not meet expectations, and you will noticeably feel that card loading on the page takes time. Possible causes include:
(1) Large data volume or complex calculation logic in the card query, which may lead to longer task execution time
(2) Accessing cards during resource peak periods; insufficient resources may affect card execution
Troubleshooting Methods
There are three ways to help you find cards with long access times (access time >5s).
(1) Check the "9th Percentile Performance of Guan-Index Cards in the Last 30 Days" chart
Find the date where the 9th percentile performance exceeds the 5-second warning line, and click to drill down. After drilling down, you can sort by runtime to find card tasks with long execution times.
(2) Check the "Average Queue and Runtime Distribution of Guan-Index Cards in the Last 30 Days" chart
Find the date with a significant dark area in the runtime distribution (proportion exceeds 20%), click the bar to drill down, find the time period with a significant dark area on that day, and drill down again. After the second drill down, you can sort by runtime to find card tasks with long execution times.
(3) Check the "Top 20 Card Tasks by Runtime in the Last 30 Days (Runtime ≥3s)" table
You can prioritize resolving the card tasks reported in the table to improve the card access experience. In "Admin Settings - Operation & Maintenance Management - Task Management", you can search by "Card Name" as the "Operation Object" to understand the historical runtime of the card and determine whether the card runtime is truly "abnormal".
Troubleshooting Ideas
For cards with long access times, we recommend troubleshooting step by step as follows:
(1) Check the data volume of the card query (e.g., whether it reaches the hundred-million level). Large data volume will lead to longer card execution time
Click the operation object located in the previous step to view the number of rows and columns in the corresponding dataset.
(2) Check the calculation logic of the card itself. If the card calculation involves multiple condition judgments and filters (number of rules >5), it will lead to longer card execution time
(3) Check whether CPU resources are sufficient. You can observe the CPU load and CPU usage trend charts to determine whether a resource bottleneck has been reached
Refer to the metrics "Node CPU Usage Trend Chart" and "Server CPU Load (System Load) Trend Chart" to understand CPU usage peaks and CPU load peaks. It is recommended to focus on the number of concurrent tasks and high-resource-consuming tasks during peak periods. You can drill down in the "Node CPU Usage Trend Chart" to view task details, sort by runtime, and find the tasks that consume the most resources for adjustment.
Optimization Measures
For important or long-running card tasks, it is recommended to consider off-peak access if it does not affect business.
During peak periods, if it does not affect business, check whether tasks with long CPU usage have upstream dependencies. For tasks without upstream dependencies, it is recommended to schedule them during off-peak periods. At the same time, try to reduce manually triggered tasks during peak periods.
If the CPU load is high during peak periods (CPU load >5), it indicates insufficient system resources. If you want to ensure card execution efficiency, consider the following solutions:
a. For a single Job-engine, it is recommended to set resource isolation in Control Tower to ensure the operation of important card tasks;
b. If budget allows, consider expansion. For specific expansion plans, please contact Guandata for evaluation.
Other Suggestions
You can set the card execution timeout in "Admin Settings - System Settings - General Settings - Runtime Parameters" to avoid wasting computing resources.
Scenario 2: Long Queue Time for Guan-index Cards
Cause of the Problem
It is recommended to check the metric "Average Queue and Runtime Distribution of Guan-Index Cards in the Last 30 Days". If the card queue time is >10s, it indicates that the card queue time does not meet expectations, and you will noticeably feel that card loading on the page takes time. Possible causes include:
(1) Many scheduled tasks during the card execution period, causing a large number of tasks to run at the same time and triggering task queuing
(2) Accessing cards during resource peak periods; insufficient resources may cause card queuing
Troubleshooting Ideas
It is recommended to troubleshoot step by step as follows:
(1) Check the "Average Queue and Runtime Distribution of Guan-Index Cards in the Last 30 Days" chart
Find the date with a significant dark area in the queue time distribution (proportion exceeds 20%), click the bar to drill down, and find the time period with a significant dark area on that day.
(2) Check whether there are relatively more scheduled tasks during periods of severe queuing
You can observe the "Dataset Runtime Distribution Yesterday" and "ETL Runtime Distribution Chart Yesterday" to see if there are more scheduled tasks (yellow line is much higher than other periods) during the period located in the previous step.
Optimization Measures
If there are many scheduled tasks, it is recommended to stagger the scheduled tasks as much as possible without affecting business to avoid excessive concentration.
(3) Check whether there are long-running tasks (runtime >60min) blocking execution during periods of severe queuing
You can drill down in the "Node CPU Usage Trend Chart" during the period located in the first step, sort by task runtime, and observe whether there are long-running tasks (runtime >60min) blocking the execution of other tasks.
Optimization Measures
After finding the blocking task, it is recommended to optimize through the following measures:
a. Compare the historical runtime of the task to understand whether the task runtime is abnormal. You can click the task operation object name to jump and view the operation history (Dataset - View Update History / ETL - View Run Records).
b. After excluding non-expected job factors, if it does not affect business, check whether tasks with long CPU usage have upstream dependencies. For tasks without upstream dependencies, it is recommended to schedule them during off-peak periods.
(4) Check whether memory resources are sufficient
Refer to the metric "Node Memory Usage Trend Chart" to understand memory usage peaks. It is recommended to focus on the queuing situation of tasks during memory usage peaks.
Optimization Measures
If there are many queued Guan-index cards during memory usage peaks (queue number >10), it is recommended to contact Guandata to evaluate whether the card task concurrency can be adjusted.
If the memory usage is high during peak periods (memory usage >95%), it indicates insufficient system resources. If you want to ensure card execution efficiency, consider the following solutions:
a. For a single Job-engine, it is recommended to set resource isolation in Control Tower to ensure the operation of important card tasks;
b. If budget allows, consider expansion. For specific expansion plans, please contact Guandata for evaluation.
Other Suggestions
You can configure the card queue timeout cancellation duration in "Admin Settings - System Settings - General Settings - Runtime Parameters" to avoid such tasks occupying resources and blocking other tasks.
Scenario 3: Long Loading Time for Direct Connection Cards
Cause of the Problem
It is recommended to check the metric "Average Queue and Runtime Distribution of Direct Connection Cards in the Last 30 Days". If the proportion of direct connection cards with access time over 10s exceeds 20%, it indicates that the card access time does not meet expectations, and you will noticeably feel that card loading on the page takes time. Possible causes include:
(1) Long time to connect to the database
(2) Database performance issues causing blocking
Optimization Measures
It is recommended to troubleshoot and optimize database performance.