Requirements
groupby is one of the building block of BI which is widely used to generate high level insights of the data.
We aim to create a set of groupby algorithms which can taking into account the metadata before generating the ASM code. The following metadata will be taken into account:
sorted: an O(n) algorithm can be easily achieved via scan through the data
cardinality: when cardinality lower than a threshold which allows the ptrs to fit in memory without impacting the system, we can use perfect identify hash and use an array to directory
store the aggregation result, this would also allow O(n) complexity
data dictionary: which can be used to transform strings into numerical values