Null Value Processing
What are null values?
The following types may all be understood as null values:
-
null: Used to indicate that a data item does not contain valid data, missing data, or data is unknown. Can exist in all types of data. In different database tables, it may display as blank or display as null.
-
Empty string: Empty text string, only exists in text type, because it's invisible, it may be confused with null.
-
Text null: Actually the text 'null', including uppercase, lowercase, and mixed case formats. In some scenarios, users may replace null values in text fields with the text string 'null' when storing data for the purpose of distinguishing from data with values.
Impact of null values
-
In any expression or function formula, if any part has a null value, the final result usually returns a null value. For example,
CONCAT([fieldA], [fieldB])
, if there is a null value in a row of data, that row's result returns null; when conditional judgment appears null=null, it returns null instead of true. -
Using
COUNT([field]) /COUNT(DISTINCT[field])
for counting statistics will filter out null values, but will not filter out empty strings;COUNT(DISTINCT[field])
calculation result is 1 less than the actual result. -
When converting columns to rows in ETL, if the field to be converted is a null value, the entire data row will be filtered out, causing data loss.
-
When the field type and value content don't match, the data displays as null, which may mislead users to think there's no data.
-
When setting deduplication primary keys in datasets, during incremental updates, there are still duplicate data in the data because null values in primary keys cannot be deduplicated.
Null value judgment and processing methods
Dataset
Overview page -- Filter data -- Select field, type "Select". Null values display as (null), empty strings display as blank, and other text type nulls display as text. If you want to set deduplication primary keys, you need to ensure that <strong>fields used as deduplication primary keys cannot contain null values</strong>. Null values in fields cannot be directly replaced and processed in datasets. It's recommended to operate in ETL, or create new calculated fields to process using functions.

ETL
- Data exploration (version 4.9 and later): Add a "Data Exploration" node after the required node, select fields, and view the null value and enumeration value overview of that field.

- Null value replacement: Replace null values with other values; Value replacement: Can replace empty strings (or null) with other values. The replacement target value must be consistent with the field's own type.

3. For cases that need to judge and replace null values according to certain logic, you can add calculated fields and use Spark SQL functions for processing. Common null value processing functions are as follows:
Purpose | Function | Example | Result |
Judge whether it's a null value | isnull([expr]); [expr] is null | true/false | |
Judge whether it's not a null value | isnotnull([expr]); [expr] is not null | true/false | |
Replace null values in expr1 with expr2 | ifnull([expr1], [expr2]) | ifnull(null,0) | 0 |
nvl([expr1], [expr2]) | nvl(null,0) | 0 | |
If expr1 is not null, return expr2, otherwise return expr3 | nvl2([expr1], [expr2], [expr3]) | nvl2(NULL, 2, 1) | 1 |
Return the first non-null value | coalesce([expr1], [expr2], ...) | coalesce(NULL, 1, NULL) | 1 |
If 2 exprs are the same, return null, otherwise return expr1 | nullif([expr1], [expr2]) | nullif(2, 2) | null |
Judge whether it's an empty string | [expr]='' [expr] <>'' | true/false | |
Judge whether two values are equal, both null should return true, one null returns false | [expr1] <=> [expr2] | true <=> NULL | false |
Card
If you don't want places where calculation results are null to display as empty, you can set null values to display as other numbers or text in the "Special Values" section of the card's right style panel.

Note: Special value settings only modify the display effect and cannot truly replace null values with other data for calculation. For example, when sorting numeric fields with null values, null values can only be sorted first or last, and won't be sorted between positive and negative numbers just because you set null to display as 0.
If null values need to participate in calculations, it's recommended to process them in ETL, or create new calculated fields in cards. For non-direct connection datasets, use the Spark functions in the above list for processing, then use the new fields for calculations. For direct connection datasets, you need to use the corresponding database functions.
Case Study
The field displays as null, but why doesn't judging and replacing null values work?
"Empty date replacement" formula: ifnull([date2],date('2099-01-01'))

Cause analysis: The formula usage itself is not problematic. Checking the field "date2", it's found to be a newly created calculated field that doesn't use functions, directly referencing another field "date1", then setting the format to "Date", as shown in the figure below. And "date1" is a date composed of numbers, with type String.

The user's intention was to convert text type dates to standard date format, then replace null values. But different type fields must use SQL functions for conversion to take effect. Simply modifying the field type in the new field won't truly take effect. When the real type conflicts with the display type, the system defaults to displaying null values, but there are actually values. At this time, using the function `isnull([date2])` to verify the result is also non-null (false).

Solution: Modify the "date2" formula to to_date([date1],'yyyyMMdd')
. Verification result as shown in the figure below.
