An Unsupervised Clustering Technique of XML Documents based on Function Transform and FFT


The KIPS Transactions:PartD, Vol. 14, No. 2, pp. 169-180, Apr. 2007
10.3745/KIPSTD.2007.14.2.169,   PDF Download:

Abstract

This paper discusses a new unsupervised XML document clustering technique based on the function transform and FFT(Fast Fourier Transform). An XML document is transformed into a discrete function based on the hierarchical nesting structure of the elements. The discrete function is, then, transformed into vectors using FFT. The vectors of twodocuments are compared using a weighted Euclideandistance metric. If the comparison is lower than the pre specified threshold, the two documents are considered similar in the structure and are grouped into the same cluster. XML clustering can be useful for the storage and searching of XML documents. The experiments wereconducted with 800 synthetic documents and also with 520 real documents. The experiments showed that the function transform and FFT are effective for the incremental and unsupervised clustering of XML documents similar in structure.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
H. S. Lee, "An Unsupervised Clustering Technique of XML Documents based on Function Transform and FFT," The KIPS Transactions:PartD, vol. 14, no. 2, pp. 169-180, 2007. DOI: 10.3745/KIPSTD.2007.14.2.169.

[ACM Style]
Ho Suk Lee. 2007. An Unsupervised Clustering Technique of XML Documents based on Function Transform and FFT. The KIPS Transactions:PartD, 14, 2, (2007), 169-180. DOI: 10.3745/KIPSTD.2007.14.2.169.