Skip to main content

Group Aggregation

1. Overview

Group aggregation refers to processing numerical values by certain dimensions or categories, aggregating multiple rows of data into one row according to the dimension. When multiple dimensions are selected, aggregation is performed according to the smallest granularity dimension.

For example, in retail industry sales statistics, you need to merge data with the same product category and calculate the corresponding total sales amount.

image.png

2. Usage Guide

2.1. Operation Steps

  1. Drag the Group Aggregation operator from the ETL operator area to the right canvas editing area;
  2. Click the Group Aggregation operator and drag fields into the dimension bar and value bar;
  3. Click the dragged field, set the field alias as needed, and select the aggregation method;
  4. At the current node, click Preview to confirm the data result.

image.png

2.2. Detailed Description

Below is an example of configuring Regional Turnover.

Merge the turnover data of the same store region together. First-tier market turnover, second-tier market turnover, and other turnover before aggregation:

image.png

  1. Drag the Group Aggregation operator from the ETL operator area to the right canvas editing area and connect it to the upstream node;
  2. Click the Group Aggregation operator, the left area becomes the current operator configuration area, and rename as needed, e.g., "Regional Turnover";

image.png

  1. Drag the store region into the dimension bar, click the field, and set the field alias as needed:

Note: The default aggregation method for value bar fields is count for text type and sum for numeric type.

image.png

  1. Drag turnover into the value bar, click the field, select the aggregation method as sum, and set the field alias as needed:

image.png

See error details: [Group aggregation node in ETL prompts missing field, but the field is not actually missing](../../../../12-Error Description.md#etl-value-replacement-error).

See error details: [Type mismatch field](../../../../12-Error Description.md#cancelled-because-sparkcontext-was-shut-down).

We support 7 aggregation methods, including but not limited to sum, min, max, etc.

Aggregation MethodPurposeUsage ScenarioExample
SumAdd up the measure values under the specified dimension to calculate the totalWhen the measure value can be accumulatedMonthly sales total, daily website visits
MinGet the minimum value of the measure under the specified dimensionWhen the measure value has a minimum conceptMinimum sales price of each product, lowest temperature of each month
MaxGet the maximum value of the measure under the specified dimensionWhen the measure value has a maximum conceptMaximum sales price of each product, highest temperature of each month
AverageCalculate the average value of the measure under the specified dimensionWhen the measure value can be averagedMonthly average sales, weekly average user logins
CountCount the number of data records under the specified dimensionWhen you need to know how many data records are under a certain dimensionWhen you need to know how many data records are under a certain dimension
Distinct CountCount the number of unique data records under the specified dimensionWhen you need to know the number of different values under a certain dimensionMonthly sales of different products, number of different customers in each region
None---
  1. Click Preview to preview the data result and ensure the aggregated data meets expectations and contains no errors or anomalies.

image.png

For subsequent use of other data processing operators, see Getting Started.