Efficient Computation of Data Cubes Using MapReduce

KIPS Transactions on Software and Data Engineering, Vol. 3, No. 11, pp. 479-486, Nov. 2014
10.3745/KTSDE.2014.3.11.479,   PDF Download:


MapReduce is a programing model used for parallelly processing a large amount of data. To analyze a large amount data, the datacube is widely used, which is an operator that computes group-bys for all possible combinations of given dimension attributes. Whenthe number of dimension attributes is n, the data cube computes 2n group-bys. In this paper, we propose an efficient method forcomputing data cubes using MapReduce. The proposed method partitions 2n group-bys into n C?n/2? batches, and computes thosebatches in stages using ?n/2? MapReduce jobs. Compared to the existing methods, the proposed method significantly reduces theamount of intermediate data generated by mappers, so that the cost of sorting and transferring those intermediate data is reducedsignificantly. Consequently, the total processing time for computing a data cube is reduced. Through experiments, we show theefficiency of the proposed method over the existing methods.

Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.

Cite this article
[IEEE Style]
K. Y. Lee, S. J. Park, E. J. Park, J. K. Park, Y. J. Choi, "Efficient Computation of Data Cubes Using MapReduce," KIPS Transactions on Software and Data Engineering, vol. 3, no. 11, pp. 479-486, 2014. DOI: 10.3745/KTSDE.2014.3.11.479.

[ACM Style]
Ki Yong Lee, So Jeong Park, Eun Ju Park, Jin Kyung Park, and Yeun Jung Choi. 2014. Efficient Computation of Data Cubes Using MapReduce. KIPS Transactions on Software and Data Engineering, 3, 11, (2014), 479-486. DOI: 10.3745/KTSDE.2014.3.11.479.