Determining the number of Clusters in On-Line Document Clustering Algorithm


The KIPS Transactions:PartB , Vol. 14, No. 7, pp. 513-522, Dec. 2007
10.3745/KIPSTB.2007.14.7.513,   PDF Download:

Abstract

Clustering is to divide given data and automatically find out the hidden meanings in the data. It analyzes data, which are difficult for people to check in detail, and then, makes several clusters consisting of data with similar characteristics. On-Line Document Clustering System, which makes a group of similar documents by use of results of the search engine, is aimed to increase the convenience of information retrieval area. Document clustering is automatically done without human interference, and the number of clusters, which affect the result of clustering, should be decided automatically too. Also, the one of the characteristics of an on-line system is guarantying fast response time. This paper proposed a method of determining the number of clusters automatically by geometrical information. The proposed method composed of two stages. In the first stage, centers of clusters are projected on the low-dimensional plane, and in the second stage, clusters are combined by use of distance of centers of clusters in the low-dimensional plane. As a result of experimenting this method with real data, it was found that clustering performance became better and the response time is suitable to on-line circumstance.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
T. C. Jee, H. J. Lee, Y. B. Lee, "Determining the number of Clusters in On-Line Document Clustering Algorithm," The KIPS Transactions:PartB , vol. 14, no. 7, pp. 513-522, 2007. DOI: 10.3745/KIPSTB.2007.14.7.513.

[ACM Style]
Tae Chang Jee, Hyun Jin Lee, and Yill Byung Lee. 2007. Determining the number of Clusters in On-Line Document Clustering Algorithm. The KIPS Transactions:PartB , 14, 7, (2007), 513-522. DOI: 10.3745/KIPSTB.2007.14.7.513.