Skip to main content

Row to Column with ETL

Background

Sometimes, we need to use a certain primary key to expand a field with fixed categories as column names, converting multiple rows of data into columns—this is the row-to-column effect. Guandata ETL supports this with both the Row-to-Column feature and SQL input.

Practical Example

Original Data

Here is a data table with 5 fields and 9 rows. The "Project" field has several fixed categories.

DepartmentSKUMonthProjectValue
Ops Dept 5HHHH001202012Target Price112
Ops Dept 5HHHH001202012Target Daily Sales212
Ops Dept 5HHHH001202012Target Conversion312
Ops Dept 5HHHH001202011Target Price111
Ops Dept 5HHHH001202011Target Daily Sales211
Ops Dept 5HHHH001202011Target Conversion311
Ops Dept 5HHHH001202010Target Price110
Ops Dept 5HHHH001202010Target Daily Sales210
Ops Dept 5HHHH001202010Target Conversion310

Expected Result

DepartmentSKUMonthTarget PriceTarget Daily SalesTarget Conversion
Ops Dept 5HHHH001202011111211311
Ops Dept 5HHHH001202012112212312
Ops Dept 5HHHH001202010110210310

Solution

Method 1: Add "Row to Column" in ETL

image.png

  1. Select the operation column and fill column

In this case, we want to convert the "Project" and "Value" fields from rows to columns. "Project" becomes the new column name, and "Value" fills the new columns.

image.png

  1. Select the aggregation type for the value

image.png

  1. Add primary key

image.png

  1. Create new columns

Creating new columns means setting the mapping between the new column names and the original row values. If unchanged, you can directly check "new column name same as original row value."

image.png

Method 2: Process with SQL

image.png

  1. Use the pivot function, separate multiple column names with commas.
SELECT * FROM input1
PIVOT(SUM(`[Value to fill in new column]`) FOR `[Column to convert]` in ('[Column Name]','[Column Name]'...))

image.png

  1. The final result.

image.png

Notes

  1. In the PIVOT function, the field name must follow FOR directly. If you add a table path, it will cause an error on submission.

image.png

  1. Dynamic column names are not recommended. For example, when creating dynamic column names, you can preview and run successfully, but when ETL runs automatically on a schedule, if the column names change, the output dataset will report an error, and the created cards will show missing fields.

image.png