Skip to main content

Using Alerts to Monitor Changes in Numeric Fields

Background Requirements

  Want to implement a function that compares the latest value of a certain field in a dataset with historical data from the last update, and sends an alert notification when the change reaches a certain amount (such as each increase exceeding 100). In Guandata BI, you can cleverly use ETL and alert functions to achieve this requirement.

Example: The current amount field value is 600, and you want to create an alert to determine when the difference between the currently updated amount and the last updated amount is greater than or equal to 100, then send a notification.

Implementation Plan

I. Creating Snapshot Tables with ETL

For snapshot table creation, please refer to the help documentation Creating Data Snapshots with ETL. Key points for this scenario are as follows:

1. New data: If the input dataset data changes rapidly and updates multiple times within a day, after necessary data processing in ETL, create a new calculated field using the function now() to timestamp the data; for data that changes slowly or has low update frequency, it's recommended to use current_date() to mark the data update date.

2. Historical data: If the dataset itself has a large amount of data, or updates multiple times a day, the snapshot table data volume will increase rapidly. It's recommended to limit the time range for historical data (for example, only keep the last 7 days of data, function: now()- interval 7 days) to achieve data slimming effects.

3. ETL updates: It's recommended to trigger updates following the input dataset.

ETL structure reference as shown in the figure below:

image.png

II. Creating Datasets for Alert Use

1. Create a new ETL, use the snapshot table obtained in the previous step as the input dataset, use window functions for offset calculations to get the last updated data value, then subtract it from the latest data value to calculate the difference. If the dataset only updates once a day or only keeps 1 record, then calculating year-over-year and period-over-period comparisons through association nodes in ETL is also possible.

Create 2 new calculated fields, numeric type "Difference" and datetime type "Latest Update Time":

Difference: [value]-lag([value],1,0)over(partition by [project_name] order by [update_time])
--lag([value],1,0) represents taking the previous row of data after sorting by update time in ascending order, taking 0 when there's no previous row;
Latest Update Time: max([update_time])over(partition by null)
--Used to mark the latest update time, because alerts generally only need to alert the most recently updated data.

ETL preview effect as shown in the figure below:

image.png

Note: Offset functions include lag(), lead(), for specific usage please refer to Spark Window Functions and Applications.

2. Filter data with difference greater than or equal to 100, and filter the most recently updated records, select the required input columns and output the dataset for alerts. If you need to retain historical alert records, after filtering data with difference greater than or equal to 100, you can output a new result set. This case outputs 2 result sets, the top one for saving alert records and doing other analysis on changing data, the bottom one for latest data alerts. ETL structure reference as shown in the figure below.

image.png

III. Creating Corresponding Alerts for "Difference"

Scenario 1: Push the same content to specified personnel, hoping to push once after data updates, it's recommended to create a card and use card alerts.

image.png

Scenario 2: There are distribution requirements for recipients, hoping to push data details to corresponding responsible persons, it's recommended to use dataset alerts.

image.png