Web Document Classification Based on Hangeul Morpheme and Keyword Analyses


The KIPS Transactions:PartD, Vol. 19, No. 4, pp. 263-270, Aug. 2012
10.3745/KIPSTD.2012.19.4.263,   PDF Download:

Abstract

With the current development of high speed Internet and massive database technology, the amount of web documents increases rapidly, and thus, classifying those documents automatically is getting important. In this study, we propose an effective method to extract document features based on Hangeul morpheme and keyword analyses, and to classify non-structured documents automatically by predicting subjects of those documents. To extract document features, first, we select terms using a morpheme analyzer, form the keyword set based on term frequency and subject-discriminating power, and perform the scoring for each keyword using the discriminating power. Then, we generate the classification model by utilizing the commercial software that implements the decision tree, neural network, and SVM(support vector machine). Experimental results show that the proposed feature extraction method has achieved considerable performance, i.e., average precision 0.90 and recall 0.84 in case of the decision tree, in classifying the web documents by subjects.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
S. L. Lee, D. H. Park, W. S. Choi, H. J. Kim, "Web Document Classification Based on Hangeul Morpheme and Keyword Analyses," The KIPS Transactions:PartD, vol. 19, no. 4, pp. 263-270, 2012. DOI: 10.3745/KIPSTD.2012.19.4.263.

[ACM Style]
Seok Lyong Lee, Dan Ho Park, Won Sik Choi, and Hong Jo Kim. 2012. Web Document Classification Based on Hangeul Morpheme and Keyword Analyses. The KIPS Transactions:PartD, 19, 4, (2012), 263-270. DOI: 10.3745/KIPSTD.2012.19.4.263.