A Similarity Join Algorithm Using a Median as a Filter


KIPS Transactions on Software and Data Engineering, Vol. 4, No. 2, pp. 71-76, Feb. 2015
10.3745/KTSDE.2015.4.2.71,   PDF Download:

Abstract

In similarity join processing, a general technique employs a generation-verification framework, which includes two phases: the first phase generates a set of candidate pairs from a collection of records; and the second phase verifies each candidate pair by computing real similarity. In order to reduce the number of candidate pairs in the verification phase, the median of one record of each candidate pair is used as a filter in this paper to test whether the other record can has the proper number of overlapped tokens. We propose a similarity join algorithm with the median filter, and show that the proposed algorithm has better performance in execution time than recent algorithms without the filter through extensive experiments on real-world datasets.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
J. S. Park, "A Similarity Join Algorithm Using a Median as a Filter," KIPS Transactions on Software and Data Engineering, vol. 4, no. 2, pp. 71-76, 2015. DOI: 10.3745/KTSDE.2015.4.2.71.

[ACM Style]
Jong Soo Park. 2015. A Similarity Join Algorithm Using a Median as a Filter. KIPS Transactions on Software and Data Engineering, 4, 2, (2015), 71-76. DOI: 10.3745/KTSDE.2015.4.2.71.