Skip to main content

Add Calculated Column

1. Overview

1.1. Function Description

Adding a calculated column means introducing a new column to the original data and generating the column's value through specific calculation or transformation rules. This covers mathematical operations, conditional judgments, string concatenation, and more based on existing columns.

By expanding or merging calculations on existing columns in the dataset, you can create new analytical metrics based on existing data, providing a more comprehensive data view.

For example, to add a new sales total column in a product sales table, you can create a new sales total field whose value equals sales amount multiplied by sales quantity. Or use a time function to extract the order time, such as DATE_TRUNC(week, [Order Time]), which returns the first day of the week for the "Order Time" field in that row.

image.png

2. Usage Guide

2.1. Operation Steps

  1. Drag the Add Calculated Column operator from the ETL operator area to the right canvas editing area;
  2. Click the Add Calculated Column operator, Add Calculated Field (supports adding multiple fields);
  3. Enter the field name, select the field type, and write the calculation formula;
  • Field types supported: numeric, text, date, date and time, boolean.
  • Calculation formulas support arithmetic operations and can use functions as needed.
  1. Click Confirm, and at the current node, click preview to confirm the data result.

image.png

2.2. Detailed Description

Below is an example of configuring a Year (function usage).

  1. Drag the Add Calculated Column operator from the ETL operator area to the right canvas editing area and connect it to the upstream node;
  2. Click the Add Calculated Column operator, the left area becomes the current operator configuration area, and rename as needed, e.g., "Year-Month";

image.png

  1. Click Add Calculated Field to edit the calculated field:
  • Function: COUNT(), YEAR()
  • Field: Date
  • Field Name: Year
  • Select Field Type: Text

image.png

Note: We support 8 types of functions, including but not limited to aggregate statistics, non-aggregate statistics, date and time, etc.

[Why does creating a new field with sum in ETL result in an error?](../../../../11-FAQ/1-Data Processing/2-ETL FAQ.md)

  1. Click Confirm and preview the data result to ensure the calculated column values meet expectations and contain no errors or anomalies.

image.png

See error details: [Parquet data source does not support void data type](../../../../12-Error Description.md#job-2632-cancelled-because-sparkcontext-was-shut-down-engine-lost).

See error details: [Illegal sequence boundaries](../../../../12-Error Description.md#illegal-sequence-boundaries-19662-to-19606-by-1).

For subsequent use of other data processing operators, see Getting Started.