Terminology Recognition System based on Machine Learning for Scientific Document Analysis


The KIPS Transactions:PartD, Vol. 18, No. 5, pp. 329-338, Oct. 2011
10.3745/KIPSTD.2011.18.5.329,   PDF Download:

Abstract

Terminology recognition system which is a preceding research for text mining, information extraction, information retrieval, semantic web, and question-answering has been intensively studied in limited range of domains, especially in bio-medical domain. We propose a domain independent terminology recognition system based on machine learning method using dictionary, syntactic features, and Web search results, since the previous works revealed limitation on applying their approaches to general domain because their resources were domain specific. We achieved F-score 80.8 and 6.5% improvement after comparing the proposed approach with the related approach, C-value, which has been widely used and is based on local domain frequencies. In the second experiment with various combinations of unithood features, the method combined with NGD(Normalized Google Distance) showed the best performance of 81.8 on F-score. We applied three machine learning methods such as Logistic regression, C4.5, and SVMs, and got the best score from the decision tree method, C4.5.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
Y. S. Choi, S. K. Song, H. W. Chun, C. H. Jeong, S. P. Choi, "Terminology Recognition System based on Machine Learning for Scientific Document Analysis," The KIPS Transactions:PartD, vol. 18, no. 5, pp. 329-338, 2011. DOI: 10.3745/KIPSTD.2011.18.5.329.

[ACM Style]
Yun Soo Choi, Sa Kwang Song, Hong Woo Chun, Chang Hoo Jeong, and Sung Pil Choi. 2011. Terminology Recognition System based on Machine Learning for Scientific Document Analysis. The KIPS Transactions:PartD, 18, 5, (2011), 329-338. DOI: 10.3745/KIPSTD.2011.18.5.329.