Performance Analysis on Declustering High-Dimensional Data by GRID Partitioning


The KIPS Transactions:PartD, Vol. 11, No. 5, pp. 1011-1020, Oct. 2004
10.3745/KIPSTD.2004.11.5.1011,   PDF Download:

Abstract

A lot of work has been done to improve the I/O performance of such a system that store and manage a massive amount of data by distributing them across multiple disks and access them in parallel. Most of the previous work has focused on an efficient mapping from a grid cell, which is determined by the interval number of each dimension, to a disk number on the assumption that each dimension is split into disjoint intervals such that entire data space is GRID-like partitioned. However, they have ignored the effects of a GRID partitioning scheme on declustering performance. In this paper, we enhance the performance of mapping function based declustering algorithms by applying a good GRID partitioning method. For this, we propose an estimation model to count the number of grid cells intersected by a range query and apply a GRID partitioning scheme which minimizes query result size among the possible schemes. While it is common to do binary partition for high-dimensional data, we choose less number of dimensions than needed for binary partition and split several times along that dimensions so that we can reduce the number of grid cells touched by a query. Several experimental results show that the proposed estimation model gives accuracy within 0.5% error ratio regardless of query size and dimension. We can also improve the performance of declustering algorithm based on mapping function, called Kronecker Sequence, which has been known to be the best among the mapping functions for high-dimensional data, up to 23 times by applying an efficient GRID partitioning scheme.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
H. C. Kim, T. W. Kim, K. J. Li, "Performance Analysis on Declustering High-Dimensional Data by GRID Partitioning," The KIPS Transactions:PartD, vol. 11, no. 5, pp. 1011-1020, 2004. DOI: 10.3745/KIPSTD.2004.11.5.1011.

[ACM Style]
Hak Cheol Kim, Tae Wan Kim, and Ki Joune Li. 2004. Performance Analysis on Declustering High-Dimensional Data by GRID Partitioning. The KIPS Transactions:PartD, 11, 5, (2004), 1011-1020. DOI: 10.3745/KIPSTD.2004.11.5.1011.