Skip to main content

Spark Date Functions and Applications

1. Time or Date Extraction Functions

Original field is date or datetime/timestamp

Return Non-date

PurposeFunctionExampleResult
Extract yearyear()year('2009-07-30')2009
Extract quarter numberquarter()quarter('2021-08-18')3
Extract monthmonth()month('2009-07-30')7
Extract day by month (1-31)dayofmonth()dayofmonth('2009-07-30'); dayofmonth('2020-07-30 13:39:37')30
Extract day by year (1-365)dayofyear()dayofyear('2022-02-01')32
Extract weekdaydayofweek() Sunday is 1; weekday() Monday is 0dayofweek('2021-08-18')-1; weekday('2021-08-18')+13
Extract hourhour()hour('2018-12-11 11:12:13')11
Extract minuteminute()minute('2018-12-11 11:12:13')12
Extract secondsecond()second('2018-12-11 11:12:13')13
Extract week numberweekofyear()weekofyear('2008-02-20')8
Extract timedate_format(ts,'HH:mm:ss')date_format('2018-12-11 11:12:13','HH:mm:ss')11:12:13

Return Date or Time

date_trunc(format, [field])  --Truncates date or datetime according to target format and returns timestamp format.
trunc([field], format) --Truncates date or datetime according to target format and returns date format.

Available format formats:

FormatterMeaning
"YEAR","YYYY","YY"Year
"QUARTER"Quarter
"MONTH","MM","MON"Month
"WEEK"Week
"DAY","DD"Day
"HOUR"Hour
"MINUTE"Minute
"SECOND"Second
"MILLISECOND"Millisecond
"MICROSECOND"Microsecond

Note:

  1. Case insensitive;

  2. Quotes must be half-width symbols, single or double quotes can be used;

  3. Day ('day'/'dd') and units below (hour, minute, second) cannot be used with trunc().

Purpose

Function

Example

Result

Get Monday of current week

trunc(date,'week')

trunc('2021-07-30', 'week')

2021-07-26

date_trunc('week',ts)

date_trunc('week', '2021-07-30 15:48:08')

2021-07-26 00:00:00

Get first day of current month

trunc(date,'MM')

trunc('2021-07-30', 'MM')

2021-07-01

date_trunc('MM',ts)

date_trunc('MM', '2021-07-30 15:48:08')

2021-07-01 00:00:00

Get last day of current month

last_day(date)

last_day('2021-07-30 15:48:08')

2021-07-31

Get first day of current quarter

trunc(date,'quarter')

trunc('2021-07-30', 'quarter')

2021-07-01

date_trunc('quarter', ts)

date_trunc('quarter', '2021-07-30 15:48:08')

2021-07-01 00:00:00

Get first day of current year

trunc(date,'year')

trunc('2021-07-30','year')

2021-01-01

date_trunc('yyyy', ts)

date_trunc('yyyy', '2021-07-30 15:48:08')

2021-01-01 00:00:00

Truncate datetime to day (zero out after)

date_trunc('day', ts)

date_trunc('day', '2021-07-30 15:48:08')

2021-07-30 00:00:00

Truncate datetime to hour (zero out after)

date_trunc('hour', ts)

date_trunc('hour', '2021-07-30 15:48:08')

2021-07-30 15:00:00

Truncate datetime to minute (zero out after)

date_trunc('minute', ts)

date_trunc('minute', '2021-07-30 15:48:08')

2021-07-30 15:48:00

2. Date or DateTime Generation Functions

Purpose

Function

Result

Generate current time

now()

2021-08-21 14:43:09

current_timestamp()

Generate today's date

current_date()

2021-08-21

Generate current timestamp

unix_timestamp()

1629528189

3. Date Time Calculations

Purpose

Function

Example

Result

Add/subtract date time

[field] +/- INTERVAL 1 YEAR/MONTH/WEEK/DAY/HOUR/MINUTE

(middle value cannot reference other fields)

'2021-07-30 15:48:08' - interval 1 year

2020-07-30 15:48:08

'2021-07-30 15:48:08' + interval 2 hour

2021-07-30 17:48:08

Calculate future date (time part not retained)

date_add([field],numeric)

date_add('2021-07-30 17:48:08',3)

2021-08-02

add_months([field],numeric) numeric is positive

add_months('2021-07-30',1)

2021-08-30

Calculate past date (time part not retained)

date_sub([field],numeric)

date_sub('2021-07-30 17:48:08',3)

2021-07-27

add_months([field],numeric) numeric is negative

add_months('2021-07-30',-1)

2021-06-30

Calculate date difference

datediff(endDate, startDate) result is integer

datediff('2021-07-30', '2021-07-31')

-1

Calculate month difference

months_between(endTime, startTime) result is float

months_between('2021-07-07 14:14:01', '2021-04-22 15:57:59')

2.5138

Calculate minute difference

*Custom function: MINUTEDIFF()

MINUTEDIFF('2021-07-02 14:28:53', '2021-07-02 12:00:16')

148

Calculate second difference

*Custom function: SECONDDIFF()

SECONDDIFF('2021-07-02 14:28:53', '2021-07-02 12:00:16')

8917

Case 1: Duration Calculation

【Requirement】One field in the dataset is event start time, another field is event duration (minutes), different events have different durations, need to calculate event end time. In this case, using [field]+ INTERVAL 1 MINUTE method cannot calculate the end time.

【Logic】unix_timestamp converts timestamp to a numeric value calculated in seconds, which facilitates mathematical calculations on timestamps. Finally, convert the calculation result back to timestamp format. Conversely, calculating the time difference between two times can also use the same method.

【Implementation】

timestamp(unix_timestamp([start time])+[duration]*60);
to_timestamp(to_unix_timestamp([start time])+[duration]*60)

【Final Effect】

image.png

4. Date and DateTime Conversion

Purpose

Function

Example

Result

Convert character date to date type

to_date(date_str[, fmt])

When already in standard format, [, fmt] can be omitted, then can be used interchangeably with date(date_str) and cast(date_str as date)

to_date('2009-07-30 04:17:52')

2009-07-30

date('2009-07-30 04:17:52')

cast('2009-07-30 04:17:52' as date)

Convert character datetime to datetime type

to_timestamp(ts_str[, fmt])

When already in standard format, [, fmt] can be omitted, then can be used interchangeably with timestamp(ts_str) and cast(ts_str as timestamp)

to_timestamp('2016-12-31', 'yyyy-MM-dd')

2016-12-31 00:00:00

timestamp('2016-12-31')

cast('2016-12-31' as timestamp)

Convert numeric, date types to character type

string(expr)

string('2021-07-30 15:48:08')

2021-07-30 15:48:08

CAST([field] AS string)

cast('2021-07-30 15:48:08' as string)

Date time format conversion, result is generally character type

date_format(timestamp, fmt)

date_format(now(), 'HH:mm:ss')

22:04:50

Convert datetime or character datetime to timestamp

to_unix_timestamp([timeExp[, fmt]])

to_unix_timestamp('2021/7/30 01:30 PM','yyyy/M/dd hh:mm a')

1627623000

unix_timestamp([timeExp[, fmt]]) When datetime is in standard format, [, fmt] can be omitted

unix_timestamp('2021-07-30 13:30:00')

1627623000

Convert timestamp to datetime

from_unixtime(unix_time[, fmt]) (automatically converted to current timezone time)

from_unixtime(1)

1970-01-01 08:00:01

Timezone offset conversion

from_utc_timestamp(timestamp, timezone)

Use world standard time to calculate other timezone times

from_utc_timestamp('2021-08-08', 'Asia/Shanghai')

2021-08-08 08:00:00

from_utc_timestamp('2021-08-08 00:00:00', 'GMT+8')

to_utc_timestamp(timestamp, timezone)

Use given timezone time to calculate world standard time

to_utc_timestamp('2021-08-08 00:00:00', 'Asia/Shanghai'); to_utc_timestamp('2021-08-08 00:00:00', 'GMT+8')

2021-08-07 16:00:00

Common date time formatting parameters DateFormatter (note the distinction between uppercase and lowercase):

SymbolMeaningExample
yy/yyyyyy: year without era; yyyy: four-digit year including era.21; 2021
M/MMMonth number. M: single-digit month without leading zero; MM: single-digit month with leading zero.1; 01
MMM/MMMMMMM: abbreviated month name; MMMM: full month nameJan; January
d/ddDay of month. d: single-digit date without leading zero; dd: single-digit date with leading zero.01~31
D/DDDay of year. D: single-digit date without leading zero; DD: single-digit date with leading zero.01~365
h/hhHour in 12-hour format. h: single-digit hour without leading zero; hh: single-digit hour with leading zero.1~12
H/HHHour in 24-hour format. H: single-digit hour without leading zero; HH: single-digit hour with leading zero0~23
m/mmMinute. m: single-digit minute without leading zero; mm: single-digit minute with leading zero.0~59
s/ssSecond. s: single-digit second without leading zero; ss: single-digit second with leading zero.0~60
SFractional second decimal, used for finer statistics and display of seconds.978
E/EEEE1 to 3 E's: abbreviated name of day of week; 4 E's: full name of day of weekTue; Tuesday
aAM-PMAM;PM
z/zzzzCurrent timezone name. z: timezone abbreviation; zzzz: full timezone nameCST;China Standard Time
ZCurrent timezone offset+0800
XCurrent timezone offset. X: +08; XX: +0800; XXX: +08:00+08:00
FWhich natural week of current month. Starting from 1st, every 7 days is one natural week.date_format('2021-08-22','EEEE') returns Sunday; date_format('2021-08-22','F') returns 4, indicating that 2021-08-22 is the 4th Sunday of August 2021.

For more parameters and usage, please refer to the official documentation: https://spark.apache.org/docs/latest/sql-ref-datetime-pattern.html

Case 2: Timezone Conversion

【Requirement】Original time is East 8 timezone, need to convert to East 9 timezone, offset by 1 timezone.

from_utc_timestamp([East 8 time],'UTC+1'); from_utc_timestamp([East 8 time],'GMT+1')

【Logic】Temporarily treat the current East 8 time as world standard time, calculate time offset by 1 timezone

【Final Effect】Before conversion: 2021-08-08 15:16:00 After conversion: 2021-08-08 16:16:00

Case 3: Convert Text Date to Standard Date

Text DateStandard FormatFunction
07/30/20212021-07-30to_date([text date],'MM/dd/yyyy')
2021/7/30 13:30:002021-07-30 13:30:00to_timestamp([text date],'yyyy/M/dd HH:mm:ss')
2021/7/30 01:30 PM2021-07-30 13:30:00to_timestamp([text date],'yyyy/M/dd hh:mm a')
2021年7月30日2021-07-30to_date([text date],'yyyy年M月dd日')
2021-07-30T16:00:00.000Z2021-07-31 08:00:00from_utc_timestamp([text date],'GMT+8')
2021-07-30T17:25:53+00:002021-07-31 09:25:53from_utc_timestamp([text date],'GMT+8')
July 30, 20212021-07-30to_date([text date],'MMMM dd, yyyy')
Aug 8, 20212021-08-08to_date([text date],'MMM d, yyyy')
202108081216002021-08-08 12:16:00to_timestamp([text date],'yyyyMMddHHmmss')

Case 4: Convert Standard Date to Text Date

Date

Target Format (Text)

Function

2021-08-08 15:16:00

2021-08

substr(string([date]),1,7)

202108 (numeric)

YEAR([date])*100+MONTH([date])

08-08

substr(string([date]),6,5)

15:16

date_format([date], 'HH:mm')

03:16 PM

date_format([date], 'hh:mm a')

Aug 8, 2021

date_format([date],'MMM d, yyyy')

Sunday

date_format([date],'EEEE')

2021年8月8日

date_format([date],'yyyy年M月d日')

2021-08-08 15:16:00 (+08:00)

date_format([date],'yyyy-MM-dd HH:mm:ss (XXX)')

Case 5: Get Second Tuesday and Wednesday of Each Month

【Requirement】From a date table, find the second Tuesday and Wednesday of each month (Guandata Academy monthly product training days)

【Logic】

1. Create calculated field "Weekday" to get which day of week each day is: dayofweek([date]) -1

2. Create calculated field "Week number" to get which natural week of the month the date is in: date_format([date],'F')

3. Filter "Weekday" range to: greater than or equal to 2 and less than or equal to 3, or directly filter 2 and 3;

4. Filter "Week number" to 2, to get all second Tuesdays and Wednesdays of each month in a year.

【Final Effect】

image.png