Spark Starter Guide 4.9: How to Rank Data

Previous post: Spark Starter Guide 4.8: How to Order and Sort Data

Ranking is, fundamentally, ordering based on a condition. So, in essence, it’s like a combination of a where clause and order by clause—the exception being that data is not removed through ranking , it is, well, ranked, instead. While ordering allows you to sort data based on a column, ranking allows you to allocate a number (e.g. row number or rank) to each row (based on a column or condition) so that you can utilize it in logical decision making, like selecting a top result, or applying further transformations.

Read More »