High-Performance Dataset & ClickHouse Syntax Notes
1) About Window Functions
BI versions before 5.9 do not support window functions by default. To use them, you need to upgrade the BI system to version 5.9 or later (additionally, high-performance datasets require ClickHouse version 22 or above).
2) Case Sensitivity
a. Function names: Some ClickHouse functions are case-sensitive, such as toDate() function written as todate() will cause errors; but some functions are case-insensitive, such as length() and Length() can both return string length. String function requires uppercase, cast([field] as String), where S must be uppercase. Pay attention when using.
b. Table names & Field names: ClickHouse is also sensitive to the case of field names. For example, if DDL defines a table named orders with a field called Itemnum, to query this field from this table, you must write select Itemnum from orders. Writing the field name as ItemNum or the table name as Orders will cause errors.
3) Function Differences
a. Common function names and syntax in ClickHouse may differ from conventional databases. For example, string to date conversion in ClickHouse is toDate(), not to_date() or str_to_date(). The official documentation for toDate() function is not detailed, but personal testing shows this function seems to only support numbers, such as toDate(11111), or 'yyyy-MM-dd', 'yyyy/MM/dd', 'yyyy MM dd' formats and corresponding formats with hours, minutes, and seconds. Inputting date strings in other formats will cause errors. If you need to parse date strings in 'yyyyMMdd' or other formats, you can process the string first, or use the parseDateTimeBestEffort() function, which can automatically support various date string formats. Please refer to the official documentation for details.
b. Many type conversion functions in ClickHouse have OrNull, OrZero suffixes, which are used to handle data or NULL that don't meet the processing scope of the corresponding function. For example, toInt8('12.34'), 12.34 is not an integer string and cannot be successfully parsed by toInt8, causing an error; if you use toInt8OrZero('12.34'), it will return 0, or toInt8OrNull('12.34') will return null.
4) Data Type Differences
a. Basic data types in ClickHouse are similar to other databases, but some type names may differ. For example, integer fields in other databases are int, smallint, bigint, etc., while in ClickHouse they are Int8, Int16, Int32, etc.
b. Here we introduce a unique data type in ClickHouse: Nullable:
I believe Nullable is not strictly a data type, but a special identifier and restriction for specific data types. Fields with Nullable() allow missing values to be stored as NULL by default. For example, a field of type Nullable(Int8) can store values of type Int8, and rows without specific values will be stored as NULL.
The reason for emphasizing Nullable() is that ClickHouse has extremely strict data type restrictions. In the above scenario, if the field type is Int8 and the input value contains NULL, it will cause an error.
5) Use of Single and Double Quotes
In Spark, it's common to use English double quotes to quote strings, and single quotes are also compatible. However, in ClickHouse, you need to strictly use single quotes ('') in English, otherwise it will cause errors.
6) Usage of dateDiff Function
The datediff function in Spark only needs two parameters [end date field] and [start date field], such as datediff([resignation date], [hire date]), returning the number of days between "resignation date" and "hire date" in the same row. The dateDiff function in ClickHouse requires 3 or 4 parameters.
dateDiff('unit', startdate, enddate, [timezone])
/*--
Supported time units: second, minute, hour, day, week, month, quarter, year.
[timezone] is an optional parameter that can be filled according to actual scenarios.
--*/
dateDiff('day',[start date],[end date])
/*-------Calculate the difference in days between two dates----------*/
7) date_sub Function and subtractDays Function
The date_sub function is Spark syntax. date_sub([sales date],1) returns the date one day before the sales date;
The subtractDays function is ClickHouse syntax, with the same writing: subtractDays([sales date],1) returns the date one day before the sales date.
8) ClickHouse requires years to be between 1970~2139, otherwise it will cause errors
ClickHouse exception, code: 321, host: xxx.xx.x.xx, port: xxxx; Code: xxx, e.displayText0= DB: Exception: Input value 2932783 of a column "UInt16" is greater than max allowed Date value, which is 49710 (version 20.4.5.36 (official build))**