Group Aggregation
1. Overview
Group aggregation refers to processing numerical values by certain dimensions or categories, aggregating multiple rows of data into one row according to the dimension. When multiple dimensions are selected, aggregation is performed according to the smallest granularity dimension.
For example, in retail industry sales statistics, you need to merge data with the same product category and calculate the corresponding total sales amount.

2. Usage Guide
2.1. Operation Steps
- Drag the Group Aggregation operator from the ETL operator area to the right canvas editing area;
- Click the Group Aggregation operator and drag fields into the dimension bar and value bar;
- Click the dragged field, set the field alias as needed, and select the aggregation method;
- At the current node, click Preview to confirm the data result.

2.2. Detailed Description
Below is an example of configuring Regional Turnover.
Merge the turnover data of the same store region together. First-tier market turnover, second-tier market turnover, and other turnover before aggregation:

- Drag the Group Aggregation operator from the ETL operator area to the right canvas editing area and connect it to the upstream node;
- Click the Group Aggregation operator, the left area becomes the current operator configuration area, and rename as needed, e.g., "Regional Turnover";

- Drag the store region into the dimension bar, click the field, and set the field alias as needed:
Note: The default aggregation method for value bar fields is count for text type and sum for numeric type.

- Drag turnover into the value bar, click the field, select the aggregation method as sum, and set the field alias as needed:

See error details: [Group aggregation node in ETL prompts missing field, but the field is not actually missing](../../../../12-Error Description.md#etl-value-replacement-error).
See error details: [Type mismatch field](../../../../12-Error Description.md#cancelled-because-sparkcontext-was-shut-down).
We support 7 aggregation methods, including but not limited to sum, min, max, etc.
Aggregation Method | Purpose | Usage Scenario | Example |
---|---|---|---|
Sum | Add up the measure values under the specified dimension to calculate the total | When the measure value can be accumulated | Monthly sales total, daily website visits |
Min | Get the minimum value of the measure under the specified dimension | When the measure value has a minimum concept | Minimum sales price of each product, lowest temperature of each month |
Max | Get the maximum value of the measure under the specified dimension | When the measure value has a maximum concept | Maximum sales price of each product, highest temperature of each month |
Average | Calculate the average value of the measure under the specified dimension | When the measure value can be averaged | Monthly average sales, weekly average user logins |
Count | Count the number of data records under the specified dimension | When you need to know how many data records are under a certain dimension | When you need to know how many data records are under a certain dimension |
Distinct Count | Count the number of unique data records under the specified dimension | When you need to know the number of different values under a certain dimension | Monthly sales of different products, number of different customers in each region |
None | - | - | - |
- Click Preview to preview the data result and ensure the aggregated data meets expectations and contains no errors or anomalies.

For subsequent use of other data processing operators, see Getting Started.