Skip to main content

How to Calculate Consecutive Occurrence Days

Scenario Introduction

     A certain event may occur or not occur, and the system will record the occurrence status of this event by day. For example, when the event occurs, it will record normal, and when it doesn't occur, it will record abnormal.

And we want to know the consecutive occurrence situation of the event. When the event is in normal status, record the consecutive days of normal status. If an abnormality occurs on a certain day, the counting stops, and counting starts again from the next normal status.

Data Structure Example

Need to calculate the consecutive occurrence days based on the status of event A on the corresponding date. (The red column is the data hoped to be obtained through ETL processing):

Event NameDateStatusConsecutive Occurrence Days
Task A2022/1/1Normal1
Task A2022/1/2Normal2
Task A2022/1/3Normal3
Task A2022/1/4Normal4
Task A2022/1/5Normal5
Task A2022/1/6Abnormal0
Task A2022/1/7Normal1
Task A2022/1/8Normal2
Task A2022/1/9Abnormal0
Task A2022/1/10Abnormal0

Implementation Method

  1. In ETL, add a calculated column, add a calculated field, merge dates into an array after grouping by dimension fields "Event Name" and "Status". The formula and preview effect are as follows (dates in the array are automatically converted to unix date values for storage).
collect_list([Date])over(partition by [Event Name],[Status]) 
--You can also use the collect_set function.

image.png

  1. Add another calculated column, add a calculated field, use the Guandata custom function lasting_days_to_date to calculate the consecutive days of each row's "Status".

LASTING_DAYS_TO_DATE

Usage: lasting_days_to_date(collect_list(date field), TO_DATE('2019-01-01'))

Description: Returns the consecutive days of the date field less than or equal to the specified date, used for calculating inventory out-of-stock days, etc.

Example: lasting_days_to_date(collect_list[Sales Date], TO_DATE('2019-01-01')), returns the consecutive out-of-stock days of goods on 2019-01-01.

lasting_days_to_date([date_list],[Date])

image.png

Note:

 1) ETL preview data is randomly displayed. To preview data in a fixed order, you can add an additional SQL node to sort the data before previewing. The sorting doesn't affect the output dataset.

2) The lasting_days_to_date function automatically sorts the dates in the date array in ascending order, then calculates the consecutive days. So there's no need to sort the date array.

3. Add a calculated column, add a calculated field. Because the calculation result of the previous step is to separately judge the consecutive days of 2 statuses, in this case we only need to count the consecutive days of "Normal" status, so we need to set the consecutive days of "Abnormal" status to zero. Add a "Select Column" node, only select the required fields to output the dataset.

case when [Status]='Normal' then [Consecutive Days]
else 0
end
     Sort the preview results in ETL or cards as shown in the figure below.

image.png

Other Applicable Scenarios

     In page cards, directly use the lasting_days_to_date function to calculate consecutive occurrence days, such as product out-of-stock days. The case is shown in the figure below.

image.png