Sorting Functions and Applications

Function Introduction

  Guandata BI visualization charts have integrated "ranking" functionality in advanced calculations, which can specify the calculation dimension for sorting, sorting method, and display top N, making it easy to create the ranking reports you need. For specific introduction, please refer to [Ranking Calculation](<../../0-Guandata-BI.md>) .

The differences between the 3 sorting types are as follows:

Row/Column Ranking: No grouping, global ranking by row or column.

Dimension Group Ranking: Only available when there are multiple dimension fields, ranking within groups after grouping by specified dimension fields.

Dimension Item Ranking: Ranking by aggregation results of specified dimensions, where "Calculation Dimension" can select dimension bar fields or comparison fields.

  However, in some complex scenarios, you may need to use ranking for filtering, or use ranking information for secondary calculations. The system's built-in ranking functionality may not meet the requirements. In such cases, Guandata BI also supports using sorting functions in window functions to calculate rankings. The following introduces commonly used sorting functions and usage scenarios.

Commonly Used Sorting Functions

Function	Meaning	Window Example
rank	Has ties, has gaps. When same rankings occur, skip the same ranking, ranking structure is generally: 1, 2, 2, 4, 5, 5, 7……	rank( ) over(partition by [grouping field] order by [sorting field]) --ascending ranking after grouping
dense_rank	Has ties, no gaps. When same rankings occur, don't skip the same ranking, ranking structure is generally: 1, 2, 2, 3, 4, 4, 5……	dense_rank( ) over(partition by [grouping field] order by [sorting field] desc) --descending ranking after grouping
row_number	Sequential ranking, no ties. Ranking structure is: 1, 2, 3, 4, 5, 6, 7……	row_number( ) over(partition by 1 order by [sorting field]) --global ascending ranking

Case Study

Scenario

  When analyzing with multi-line charts, to avoid "spaghetti charts" that are difficult to read due to too many dimensions, it's often necessary to filter dimensions by Top N. Take the following chart as an example, where the horizontal axis is the date dimension, comparing by "Province", analyzing the monthly change trends of "Sales Amount" for each "Province"; when there are many "Provinces", it's hoped to only display the "Provinces" with "Sales Amount" total ranking Top N in the chart's time range.

Solution One

  Create a new multi-line chart, drag the required fields into dimension, comparison, and value bars respectively. At this time, there are many messy lines. Also drag the field "Sales Amount" used in the value bar into the "More Tooltips" bar, with aggregation method defaulting to "Sum", click "Advanced Calculation — Ranking", set according to the figure below, confirm and save, then rename the field.

Note: Because we need to rank by the total value of provinces, the "Province" field is a comparison item, so the sorting type should use "Dimension Item Ranking".

  The effect is shown in the figure below, when the cursor is placed over it, the ranking information will be displayed in the floating window.

Solution Two

Create 2 calculated fields in the card:

「Province Sales Total」: sum([Sales Amount])over(partition by [Province])
「Ranking」：dense_rank()over(partition by 1 order by [Province Sales Total] desc)

Drag "Ranking" into the filter bar and set the range to "less than or equal to 3". The final display effects of the two solutions are basically the same, with only slight differences in floating window and legend display.

Extended Case

Calculate cumulative sales proportion of product subcategories by sales ranking. For specific solution, please refer to [Calculate Cumulative Sales Proportion of Product Subcategories by Sales Ranking](17-Using Window Functions for Cumulative Calculations.md#Case Two Calculate Cumulative Sales Proportion of Product Subcategories by Sales Ranking) .

Notes

  The system's built-in ranking functionality first aggregates data to the minimum number of rows before sorting, so the sorting method can be selected from rank, dense_rank, and row_number as needed. However, window functions are equivalent to sorting on the original dataset and will not aggregate one group into one piece of data, so <strong>only the dense_rank function is recommended in cards</strong> (because it allows ties and no gaps).

  The comparison of window calculation results of the 3 functions in cards (aggregation method is "No Processing") is shown in the figure below. It can be seen that the rankings of rank and row_number are affected by the original number of rows in the dataset, and the sorting results do not meet expectations. If you must use these two sorting logics, it is recommended to prioritize the system's built-in ranking functionality. If the requirements are not met, you need to first aggregate the data to the required dimensions in ETL before using the corresponding window functions for calculation.

Function Introduction​

Commonly Used Sorting Functions​

Case Study​

Scenario​

Solution One​

Solution Two​

Extended Case​

Notes​