Spark Date Functions and Applications
1. Time or Date Extraction Functions
Original field is date or datetime/timestamp
Return Non-date
Purpose | Function | Example | Result |
Extract year | year() | year('2009-07-30') | 2009 |
Extract quarter number | quarter() | quarter('2021-08-18') | 3 |
Extract month | month() | month('2009-07-30') | 7 |
Extract day by month (1-31) | dayofmonth() | dayofmonth('2009-07-30'); dayofmonth('2020-07-30 13:39:37') | 30 |
Extract day by year (1-365) | dayofyear() | dayofyear('2022-02-01') | 32 |
Extract weekday | dayofweek() Sunday is 1; weekday() Monday is 0 | dayofweek('2021-08-18')-1; weekday('2021-08-18')+1 | 3 |
Extract hour | hour() | hour('2018-12-11 11:12:13') | 11 |
Extract minute | minute() | minute('2018-12-11 11:12:13') | 12 |
Extract second | second() | second('2018-12-11 11:12:13') | 13 |
Extract week number | weekofyear() | weekofyear('2008-02-20') | 8 |
Extract time | date_format(ts,'HH:mm:ss') | date_format('2018-12-11 11:12:13','HH:mm:ss') | 11:12:13 |
Return Date or Time
date_trunc(format, [field]) --Truncates date or datetime according to target format and returns timestamp format.
trunc([field], format) --Truncates date or datetime according to target format and returns date format.
Available format formats:
Formatter | Meaning |
"YEAR","YYYY","YY" | Year |
"QUARTER" | Quarter |
"MONTH","MM","MON" | Month |
"WEEK" | Week |
"DAY","DD" | Day |
"HOUR" | Hour |
"MINUTE" | Minute |
"SECOND" | Second |
"MILLISECOND" | Millisecond |
"MICROSECOND" | Microsecond |
Note:
-
Case insensitive;
-
Quotes must be half-width symbols, single or double quotes can be used;
-
Day ('day'/'dd') and units below (hour, minute, second) cannot be used with trunc().
Purpose | Function | Example | Result |
Get Monday of current week | trunc(date,'week') | trunc('2021-07-30', 'week') | 2021-07-26 |
date_trunc('week',ts) | date_trunc('week', '2021-07-30 15:48:08') | 2021-07-26 00:00:00 | |
Get first day of current month | trunc(date,'MM') | trunc('2021-07-30', 'MM') | 2021-07-01 |
date_trunc('MM',ts) | date_trunc('MM', '2021-07-30 15:48:08') | 2021-07-01 00:00:00 | |
Get last day of current month | last_day(date) | last_day('2021-07-30 15:48:08') | 2021-07-31 |
Get first day of current quarter | trunc(date,'quarter') | trunc('2021-07-30', 'quarter') | 2021-07-01 |
date_trunc('quarter', ts) | date_trunc('quarter', '2021-07-30 15:48:08') | 2021-07-01 00:00:00 | |
Get first day of current year | trunc(date,'year') | trunc('2021-07-30','year') | 2021-01-01 |
date_trunc('yyyy', ts) | date_trunc('yyyy', '2021-07-30 15:48:08') | 2021-01-01 00:00:00 | |
Truncate datetime to day (zero out after) | date_trunc('day', ts) | date_trunc('day', '2021-07-30 15:48:08') | 2021-07-30 00:00:00 |
Truncate datetime to hour (zero out after) | date_trunc('hour', ts) | date_trunc('hour', '2021-07-30 15:48:08') | 2021-07-30 15:00:00 |
Truncate datetime to minute (zero out after) | date_trunc('minute', ts) | date_trunc('minute', '2021-07-30 15:48:08') | 2021-07-30 15:48:00 |
2. Date or DateTime Generation Functions
Purpose | Function | Result |
Generate current time | now() | 2021-08-21 14:43:09 |
current_timestamp() | ||
Generate today's date | current_date() | 2021-08-21 |
Generate current timestamp | unix_timestamp() | 1629528189 |
3. Date Time Calculations
Purpose | Function | Example | Result |
Add/subtract date time | [field] +/- INTERVAL 1 YEAR/MONTH/WEEK/DAY/HOUR/MINUTE (middle value cannot reference other fields) | '2021-07-30 15:48:08' - interval 1 year | 2020-07-30 15:48:08 |
'2021-07-30 15:48:08' + interval 2 hour | 2021-07-30 17:48:08 | ||
Calculate future date (time part not retained) | date_add([field],numeric) | date_add('2021-07-30 17:48:08',3) | 2021-08-02 |
add_months([field],numeric) numeric is positive | add_months('2021-07-30',1) | 2021-08-30 | |
Calculate past date (time part not retained) | date_sub([field],numeric) | date_sub('2021-07-30 17:48:08',3) | 2021-07-27 |
add_months([field],numeric) numeric is negative | add_months('2021-07-30',-1) | 2021-06-30 | |
Calculate date difference | datediff(endDate, startDate) result is integer | datediff('2021-07-30', '2021-07-31') | -1 |
Calculate month difference | months_between(endTime, startTime) result is float | months_between('2021-07-07 14:14:01', '2021-04-22 15:57:59') | 2.5138 |
Calculate minute difference | *Custom function: MINUTEDIFF() | MINUTEDIFF('2021-07-02 14:28:53', '2021-07-02 12:00:16') | 148 |
Calculate second difference | *Custom function: SECONDDIFF() | SECONDDIFF('2021-07-02 14:28:53', '2021-07-02 12:00:16') | 8917 |
Case 1: Duration Calculation
【Requirement】One field in the dataset is event start time, another field is event duration (minutes), different events have different durations, need to calculate event end time. In this case, using [field]+ INTERVAL 1 MINUTE
method cannot calculate the end time.
【Logic】unix_timestamp converts timestamp to a numeric value calculated in seconds, which facilitates mathematical calculations on timestamps. Finally, convert the calculation result back to timestamp format. Conversely, calculating the time difference between two times can also use the same method.
【Implementation】
timestamp(unix_timestamp([start time])+[duration]*60);
to_timestamp(to_unix_timestamp([start time])+[duration]*60)
【Final Effect】

4. Date and DateTime Conversion
Purpose | Function | Example | Result |
Convert character date to date type | to_date(date_str[, fmt]) When already in standard format, [, fmt] can be omitted, then can be used interchangeably with date(date_str) and cast(date_str as date) | to_date('2009-07-30 04:17:52') | 2009-07-30 |
date('2009-07-30 04:17:52') | |||
cast('2009-07-30 04:17:52' as date) | |||
Convert character datetime to datetime type | to_timestamp(ts_str[, fmt]) When already in standard format, [, fmt] can be omitted, then can be used interchangeably with timestamp(ts_str) and cast(ts_str as timestamp) | to_timestamp('2016-12-31', 'yyyy-MM-dd') | 2016-12-31 00:00:00 |
timestamp('2016-12-31') | |||
cast('2016-12-31' as timestamp) | |||
Convert numeric, date types to character type | string(expr) | string('2021-07-30 15:48:08') | 2021-07-30 15:48:08 |
CAST([field] AS string) | cast('2021-07-30 15:48:08' as string) | ||
Date time format conversion, result is generally character type | date_format(timestamp, fmt) | date_format(now(), 'HH:mm:ss') | 22:04:50 |
Convert datetime or character datetime to timestamp | to_unix_timestamp([timeExp[, fmt]]) | to_unix_timestamp('2021/7/30 01:30 PM','yyyy/M/dd hh:mm a') | 1627623000 |
unix_timestamp([timeExp[, fmt]]) When datetime is in standard format, [, fmt] can be omitted | unix_timestamp('2021-07-30 13:30:00') | 1627623000 | |
Convert timestamp to datetime | from_unixtime(unix_time[, fmt]) (automatically converted to current timezone time) | from_unixtime(1) | 1970-01-01 08:00:01 |
Timezone offset conversion | from_utc_timestamp(timestamp, timezone) Use world standard time to calculate other timezone times | from_utc_timestamp('2021-08-08', 'Asia/Shanghai') | 2021-08-08 08:00:00 |
from_utc_timestamp('2021-08-08 00:00:00', 'GMT+8') | |||
to_utc_timestamp(timestamp, timezone) Use given timezone time to calculate world standard time | to_utc_timestamp('2021-08-08 00:00:00', 'Asia/Shanghai'); to_utc_timestamp('2021-08-08 00:00:00', 'GMT+8') | 2021-08-07 16:00:00 | |
Common date time formatting parameters DateFormatter (note the distinction between uppercase and lowercase):
Symbol | Meaning | Example |
yy/yyyy | yy: year without era; yyyy: four-digit year including era. | 21; 2021 |
M/MM | Month number. M: single-digit month without leading zero; MM: single-digit month with leading zero. | 1; 01 |
MMM/MMMM | MMM: abbreviated month name; MMMM: full month name | Jan; January |
d/dd | Day of month. d: single-digit date without leading zero; dd: single-digit date with leading zero. | 01~31 |
D/DD | Day of year. D: single-digit date without leading zero; DD: single-digit date with leading zero. | 01~365 |
h/hh | Hour in 12-hour format. h: single-digit hour without leading zero; hh: single-digit hour with leading zero. | 1~12 |
H/HH | Hour in 24-hour format. H: single-digit hour without leading zero; HH: single-digit hour with leading zero | 0~23 |
m/mm | Minute. m: single-digit minute without leading zero; mm: single-digit minute with leading zero. | 0~59 |
s/ss | Second. s: single-digit second without leading zero; ss: single-digit second with leading zero. | 0~60 |
S | Fractional second decimal, used for finer statistics and display of seconds. | 978 |
E/EEEE | 1 to 3 E's: abbreviated name of day of week; 4 E's: full name of day of week | Tue; Tuesday |
a | AM-PM | AM;PM |
z/zzzz | Current timezone name. z: timezone abbreviation; zzzz: full timezone name | CST;China Standard Time |
Z | Current timezone offset | +0800 |
X | Current timezone offset. X: +08; XX: +0800; XXX: +08:00 | +08:00 |
F | Which natural week of current month. Starting from 1st, every 7 days is one natural week. | date_format('2021-08-22','EEEE') returns Sunday; date_format('2021-08-22','F') returns 4, indicating that 2021-08-22 is the 4th Sunday of August 2021. |
For more parameters and usage, please refer to the official documentation: https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html
Case 2: Timezone Conversion
【Requirement】Original time is East 8 timezone, need to convert to East 9 timezone, offset by 1 timezone.
from_utc_timestamp([East 8 time],'UTC+1'); from_utc_timestamp([East 8 time],'GMT+1')
【Logic】Temporarily treat the current East 8 time as world standard time, calculate time offset by 1 timezone
【Final Effect】Before conversion: 2021-08-08 15:16:00 After conversion: 2021-08-08 16:16:00
Case 3: Convert Text Date to Standard Date
Text Date | Standard Format | Function |
07/30/2021 | 2021-07-30 | to_date([text date],'MM/dd/yyyy') |
2021/7/30 13:30:00 | 2021-07-30 13:30:00 | to_timestamp([text date],'yyyy/M/dd HH:mm:ss') |
2021/7/30 01:30 PM | 2021-07-30 13:30:00 | to_timestamp([text date],'yyyy/M/dd hh:mm a') |
2021年7月30日 | 2021-07-30 | to_date([text date],'yyyy年M月dd日') |
2021-07-30T16:00:00.000Z | 2021-07-31 08:00:00 | from_utc_timestamp([text date],'GMT+8') |
2021-07-30T17:25:53+00:00 | 2021-07-31 09:25:53 | from_utc_timestamp([text date],'GMT+8') |
July 30, 2021 | 2021-07-30 | to_date([text date],'MMMM dd, yyyy') |
Aug 8, 2021 | 2021-08-08 | to_date([text date],'MMM d, yyyy') |
20210808121600 | 2021-08-08 12:16:00 | to_timestamp([text date],'yyyyMMddHHmmss') |
Case 4: Convert Standard Date to Text Date
Date | Target Format (Text) | Function |
2021-08-08 15:16:00 | 2021-08 | substr(string([date]),1,7) |
202108 (numeric) | YEAR([date])*100+MONTH([date]) | |
08-08 | substr(string([date]),6,5) | |
15:16 | date_format([date], 'HH:mm') | |
03:16 PM | date_format([date], 'hh:mm a') | |
Aug 8, 2021 | date_format([date],'MMM d, yyyy') | |
Sunday | date_format([date],'EEEE') | |
2021年8月8日 | date_format([date],'yyyy年M月d日') | |
2021-08-08 15:16:00 (+08:00) | date_format([date],'yyyy-MM-dd HH:mm:ss (XXX)') |
Case 5: Get Second Tuesday and Wednesday of Each Month
【Requirement】From a date table, find the second Tuesday and Wednesday of each month (Guandata Academy monthly product training days)
【Logic】
1. Create calculated field "Weekday" to get which day of week each day is: dayofweek([date]) -1
2. Create calculated field "Week number" to get which natural week of the month the date is in: date_format([date],'F')
3. Filter "Weekday" range to: greater than or equal to 2 and less than or equal to 3, or directly filter 2 and 3;
4. Filter "Week number" to 2, to get all second Tuesdays and Wednesdays of each month in a year.
【Final Effect】
