Guandata's calculation engine is built on Spark, so all SQL functions supported by Spark can be used on the Guandata platform. Spark Official Documentation.
The following lists some commonly used functions.
Applicable to Guan-index datasets, ETL output datasets, Excel/CSV datasets.
For datasets directly connected to databases, there are differences in SQL standard support among various database types.
Function | Meaning | Usage | Description | Example |
ABS | Absolute value | ABS([field]) | Returns absolute value | ABS(-3), returns 3 |
CEIL | Round up | CEIL(numeric/[field]) | Returns the smallest integer not less than numeric/[field] | CEIL(4.12), returns 5 |
FLOOR | Round down | FLOOR(numeric/[field]) | Returns the largest integer not greater than numeric/[field] | FLOOR(4.12), returns 4 |
INT | Convert to integer | INT([field]) | Returns the largest integer not greater than numeric/[field]. Converts String format to numeric/[field] type | INT(1.5) returns 1 |
LOG | Logarithm | LOG(numeric/[field]A,[numeric/[field]B]) | Takes A as base, finds logarithm of B, defaults to natural logarithm base | LOG(2,4), returns 2.0 |
POW | Power | POW(numeric/[field]A,numeric/[field]B) | Finds the Bth power of numeric/[field]A | POW(4,2), returns 16.0 |
RAND | Random number | RAND() | Returns random decimal greater than 0 and less than 1 | RAND(), returns random number |
ROUND | Round to nearest | ROUND(numeric/[field]A[,integer D]) | Returns numeric/[field]A rounded to D decimal places. Defaults to 0 when not specified | ROUND(4.12,1), returns 4.1 |
SQRT | Square root | SQRT(numeric/[field]) | Finds the square root of numeric/[field], must be greater than or equal to zero | SQRT(4), returns 2.0 |
Function | Meaning | Usage | Description | Example |
CONCAT | Concatenate strings | CONCAT([field1], [field2]...) | Returns strings connected in order | CONCAT([product code], [type code]), returns the concatenated string of product code and type code |
FORMAT_STRING | Format output | format_string(specified format,[field]) | Formats string output in specified form | format_string('%06d',82343), gets 082343 |
INSTR | Find string position | INSTR([text field], [search field]) | Returns the position where the search string first appears in the text [field] value, result is an integer greater than 0, returns 0 if not found | INSTR([name], ','), returns the position where ',' first appears in the name field |
LENGTH | String length | LENGTH([field]) | Returns the length of the string | LENGTH([product name]), returns the length of product name |
LOWER | Lowercase form | LOWER([field]) | Returns the string in all lowercase form of the expression or [field] value | LOWER(ABC), returns abc |
REGEXP_EXTRACT | String regex parsing | REGEXP_EXTRACT(string, regex, index) | Returns string regex parsing result, 'index' is the return result (0 means return all results, 1 means return the first match result in the regex) | REGEXP_EXTRACT([product ID], '\d+', 0), returns the numeric part in product ID |
REGEXP_REPLACE | String regex replacement | REGEXP_REPLACE(string A, regex, string B) | Returns the result after replacing parts of string A that match the regex with string B | REGEXP_REPLACE([product name], '\d+', ''), replaces the numeric part in product ID with empty string |
REPEAT | Repeat string | REPEAT([field], numeric) | Returns new string result after repeating the string for the specified number of times | REPEAT([product name], 2), returns the string after repeating product name 2 times, such as [product name][product name] |
REVERSE | Reverse string | REVERSE([field]) | Returns new string result after reversing the string | REVERSE([type code]), input ABC returns CBA |
SUBSTR | Extract substring | SUBSTR([field], start position[, length]) | Returns substring of specified length from start position, length is optional; negative index represents counting string characters from end forward | SUBSTR([product type], 4), returns substring from index 4 to end of product type; SUBSTR([product type], -4), returns last 4 characters of product type |
TRIM | Remove spaces | TRIM([field]) | Removes spaces on both sides of data in expression or [field] | TRIM(" ABC "), returns ABC |
UPPER | Uppercase form | UPPER([field]) | Returns the string in all uppercase form of the expression or [field] value | UPPER("abc"), returns ABC |
Function | Meaning | Usage | Description | Example |
AVG | Average | AVG([field]) | Returns the average of all values in [field], only applies to numeric [field] NULL values are not calculated | AVG(sales), returns the average of all non-null values corresponding to the sales [field] |
COUNT | Count | COUNT([field]) | Returns the number of data entries for all valid [field] in the expression NULL values are not calculated | COUNT(sales), returns the number of data entries for all non-null values corresponding to the sales [field] |
COUNT_DISTINCT | Distinct count | COUNT(DISTINCT([field])) | Distinct count, returns the number of different data entries for all valid [field] in the expression, null values are not calculated | COUNT(DISTINCT(sales)), returns the number of different data entries for all non-null values corresponding to the sales [field] |
MAX | Maximum | MAX([field]) | Returns the maximum value in the expression or numeric [field], only applies to numeric fields | MAX(sales), returns the maximum value corresponding to the sales field |
MIN | Minimum | MIN([field]) | Returns the minimum value in the expression or numeric [field], only applies to numeric fields | MIN(sales), returns the minimum value corresponding to the sales field |
PERCENTILE | Median/percentile | PERCENTILE([field], percentile) | Finds median and percentile of numeric type, input parameter is numeric, second parameter range is 0 to 1, e.g. 0.1, 0.25, 0.75, 0.95 | PERCENTILE([math exam score], 0.5), returns the median of exam scores |
STDDEV | Sample standard deviation | STDDEV([field]) | Returns sample standard deviation | STDDEV([sales]) |
STDDEV_POP | Population standard deviation | STDDEV_POP([field]) | Returns population standard deviation | STDDEV_POP([sales]) |
SUM | Sum | SUM([field]) | Returns the sum of all values in the expression or numeric [field], only applies to numeric [field] Null values are not calculated | SUM(sales), returns the sum of all non-null values corresponding to the sales [field] |
VAR_POP | Population variance | VAR_POP([field]) | Returns population variance | VAR_POP([sales]) |
VARIANCE | Sample variance | VARIANCE([field]) | Returns sample variance | VARIANCE([sales]) |
Function | Meaning | Usage | Description | Example |
GREATEST | Compare maximum | GREATEST([field1], [field2]……) | Compares values in each row across columns, returns the maximum value | GREATEST([department1 sales], [department2 sales]), returns the maximum value between department1 sales and department2 sales fields in that row |
LEAST | Compare minimum | LEAST([field1], [field2]……) | Compares values in each row across columns, returns the minimum value | LEAST([department1 sales], [department2 sales]), returns the minimum value between department1 sales and department2 sales fields in that row |
Function | Meaning | Usage | Description | Example |
CASE WHEN | Conditional function | CASE WHEN(expression1) THEN result1 WHEN (expression2) THEN result2 …… ELSE resultn END | Returns corresponding result if expression is satisfied, returns default result after ELSE if none satisfied, must have END keyword at the end | CASE WHEN([score]>=60 and [score]<70) THEN 'Pass' WHEN ([score]>=70 and [score]<80) THEN 'Medium' WHEN ([score]>=80 and [score]<90) THEN 'Good' WHEN ([score]>=90) THEN 'Excellent' ELSE 'Fail' END |
COALESCE | Return non-null value | COALESCE([field1], [field2], 'default') | Returns the first non-null value in a group of [field]; if all values are NULL, returns NULL | COALESCE([name1], [name2], 'Anonymous'); returns the first non-null name in the parameters, if all are empty, returns 'Anonymous' |
IF | Conditional function | IF(expression,result1,result2) | Returns result1 if expression is satisfied, otherwise returns result2 | IF([score]>80,'Pass','Fail') |