Skip to main content

Group By

Overview

Group By processes measure values based on one or more dimensions and aggregates multiple rows into one row by dimension. When multiple dimensions are selected, aggregation is performed at the most granular dimension level.

For example, in retail sales analysis, data with the same product category can be grouped together and the corresponding total sales amount can be calculated.

Procedure

  1. Drag the Group By operator from the dataflow operator panel into the canvas on the right.

  2. Click the Group By operator and drag fields into the Dimensions area and the Metrics area.

  3. Click the dragged field, set a field alias if needed, and choose an aggregation method.

    |300

    |300

  4. Click Preview on the current node to confirm the result.

Example

The following example shows how to configure Regional Revenue.

Aggregate sales quantities by store region and group them together. Preview before aggregation:

  1. Drag the Group By operator from the ETL operator area to the canvas on the right and connect it to the upstream node.

  2. Click the Group By operator. The left panel becomes the current operator configuration area. Rename it based on business needs, for example Regional Quantity_Sold.

  3. Drag Region into the Dimensions section, then click the field and configure its alias as needed.

    Note

    The default aggregation for fields in the Measures section is Count for text fields and Sum for numeric fields.

  4. Drag Quantity_Sold into the Metric section, click the field, choose Sum as the aggregation method, and configure the field alias if needed.

    Seven aggregation methods are supported, including Sum, Minimum, and Maximum, as shown below.

Aggregation MethodPurposeScenarioExample
SumAdd measure values under a specified dimension and calculate the totalWhen measure values are additiveMonthly total sales, daily website visits
MinimumRetrieve the minimum measure value under a specified dimensionWhen the measure has a minimum conceptLowest selling price per product, monthly minimum temperature
MaximumRetrieve the maximum measure value under a specified dimensionWhen the measure has a maximum conceptHighest selling price per product, monthly maximum temperature
AverageCalculate the average measure value under a specified dimensionWhen the measure can be averagedAverage monthly sales, average weekly user logins
CountCount the number of records under a specified dimensionWhen you need the number of records under a dimensionNumber of records under a given dimension
Distinct CountCount distinct records under a specified dimensionWhen you need the number of different values under a dimensionMonthly sales count by distinct products, distinct customer count per region
None---
  1. Click Preview to preview the data result and ensure the aggregated data matches expectations and does not contain errors or abnormal values.