Comparison of Significant Term Extraction Based on the Number of Selected Principal Components


KIPS Transactions on Software and Data Engineering, Vol. 13, No. 3, pp. 329-336, Mar. 2006
10.3745/KIPSTB.2006.13.3.329, Full Text:

Abstract

In this paper, we propose a method of significant term extraction within a document. The technique used is Principal Component Analysis(PCA) which is one of the multivariate analysis methods. PCA can sufficiently use term-term relationships within a document by term-term correlations. We use a correlation matrix instead of a covariance matrix between terms for performing PCA. We also try to find out thresholds of both the number of components to be selected and correlation coefficients between selected components and terms. The experimental results on 283 Korean newspaper articles show that the condition of the first six components with correlation coefficients of 0.4 is the best for extracting sentence based on the significant selected terms.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
C. B. Lee, C. Y. Ock and H. R. Park, "Comparison of Significant Term Extraction Based on the Number of Selected Principal Components," KIPS Journal B (2001 ~ 2012) , vol. 13, no. 3, pp. 329-336, 2006. DOI: 10.3745/KIPSTB.2006.13.3.329.

[ACM Style]
Chang Beom Lee, Cheol Young Ock, and Hyuk Ro Park. 2006. Comparison of Significant Term Extraction Based on the Number of Selected Principal Components. KIPS Journal B (2001 ~ 2012) , 13, 3, (2006), 329-336. DOI: 10.3745/KIPSTB.2006.13.3.329.