Calculating Effective Working Days in Date Range
Requirement Background
Sometimes order or work order datasets will save "start time" and "end time". To evaluate efficiency, it's necessary to calculate the actual working days within the time range from "start time" to "end time", excluding Saturdays, Sundays, statutory holidays and other non-effective working days. For example, during 2022-09-30~2022-10-08, there are actually only 2 effective working days: 2022-09-30 and 2022-10-08.
Prerequisites
-
ETL processing is required, so the datasets used must be non-direct datasets;
-
Array-related functions are needed. For function usage, please refer to [Spark SQL Array Processing Functions and Applications](../5-Functions/1-Spark Functions/2-Spark SQL Array Processing Functions and Applications.md);
-
Prepare a dataset specifically for counting Saturdays, Sundays, and statutory holidays. Considering scenarios where enterprises have custom non-working days besides national statutory holidays, it's recommended to use manual tables (for specific reference, see [Holiday Data Distinction and Comparison](7-Holiday Data Distinction and Comparison.md)). Upload files to BI and append new year's holidays once at the end of each year. In this scenario, date filtering is needed to only retain holiday dates, as shown in the figure below.

Implementation Steps
1. Processing Order Data Table
a. Ensure that "start time" and "end time" fields are in date format. If not in date format (string, timestamp, etc.), be sure to use functions (such as to_date()) to convert to date format.
b. Create a new text-type calculated field "interval dates", returning an array listing the specific dates within the range from "start time" to "end time"; create a constant field, for example, numeric type "constant": 1.
sequence([start_time],[end_time])
-- Guandata BI automatically converts date type arrays to unixdate integer arrays for storage, more space-efficient and faster, without affecting usage

c. Create a new calculated field "working days", using the array_remove function to remove non-working days from the interval dates array, and then count the remaining days.
size(array_remove([interval_dates], [holiday_dates]))
d. Create a new calculated field "total days", using the size function to count the total number of days in the interval dates array.
size([interval_dates])
e. Create a new calculated field "non-working days", subtracting working days from total days.
[total_days] - [working_days]
2. Processing Holiday Dataset
a. Ensure that the holiday date field is in date format.
b. Create a new calculated field "holiday dates", using the collect_list function to aggregate all holiday dates into an array.
collect_list([holiday_date])
3. Final Result
After the above processing, you can get:
- Working days: The number of effective working days within the date range
- Non-working days: The number of non-working days within the date range
- Total days: The total number of days within the date range
Notes
-
The holiday dataset needs to be updated regularly to include new statutory holidays.
-
The array_remove function will remove all matching elements from the array. If there are duplicate dates in the holiday dataset, it may affect the calculation results.
-
This method is suitable for calculating working days within a relatively short time range. For very long time ranges, the calculation may be slower.
-
If you need to calculate working days for multiple date ranges, you can use this method to create a reusable dataset.