A Pipelined Hash Join Method for Load Balancing


The KIPS Transactions:PartD, Vol. 9, No. 5, pp. 755-768, Oct. 2002
10.3745/KIPSTD.2002.9.5.755,   PDF Download:

Abstract

We investigate the effect of the data skew of join attributes on the performance of a pipelined multi-way hash join method, and propose two new hash join methods with load balancing capabilities. The first proposed method allocates buckets statically by round-robin fashion, and the second one allocates buckets adaptively via a frequency distribution. Using hash-based joins, multiple joins can be pipelined so that the early results from a join, before the whole join is completed, are sent to the next join processing without staying on disks. Unless the pipelining execution of multiple hash joins includes some load balancing mechanisms, the skew effect can severely deteriorate system performance. In this paper, we derive an execution model of the pipeline segment and a cost model, and develop a simulator for the study. As shown by our simulation with a wide range of parameters, join selectivities and sizes of relations deteriorate the system performance as the degree of data skew is larger. But the proposed method using a large number of buckets and a tuning technique can offer substantial robustness against a wide range of skew conditions.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
J. G. Moon, N. S. Park, P. J. Kim, S. I. Kim, "A Pipelined Hash Join Method for Load Balancing," The KIPS Transactions:PartD, vol. 9, no. 5, pp. 755-768, 2002. DOI: 10.3745/KIPSTD.2002.9.5.755.

[ACM Style]
Jin Gue Moon, No Sang Park, Pyeng Jung Kim, and Seong Il Kim. 2002. A Pipelined Hash Join Method for Load Balancing. The KIPS Transactions:PartD, 9, 5, (2002), 755-768. DOI: 10.3745/KIPSTD.2002.9.5.755.