A Korean Homonym Disambiguation System Using Refined Semantic Information and Thesaurus

KIPS Transactions on Software and Data Engineering, Vol. 12, No. 7, pp. 829-840, Jul. 2005
10.3745/KIPSTB.2005.12.7.829, Full Text:


Word Sense Disambiguation(WSD) is one of the most difficult problem in Korean information processing. We propose a WSD model with the capability to filter semantic information using the specific characteristics in dictionary definitions, and with added information, useful to sense determination, such as statistical, distance and case information. we propose a model, which can resolve the issues resulting from the scarcity of semantic information data, based on the word hierarchy system (thesaurus) developed by Ulsan Universty''s UOU Word Intelligent Network, a dictionary-based lexicological database. Among the WSD models elaborated by this study, the one using statistical information, distance and case information along with the thesaurus (hereinafter referred to as "SDJ-X model") performed the best. In an experiment conducted on the sense-tagged corpus consisting of 1,500,000 eojeols, provided by the Sejong Project, the SDJ-X model recorded improvements over the maximum frequency word sense determination (maximum frequency determination, MFC, accuracy baseline) of 18.87% (21.73% for nouns, 17.11% for verbs). The results were superior in accuracy to the model using statistical and inter-eojeol distance weights by 10.49% (8.84% for nouns, 11.51% for verbs). Finally, the accuracy level of the SDJ-X model was higher than that recorded by the model using only statistical information, distance and case information, without the thesaurus by a margin of 6.12% (5.29%for nouns, 6.64% for verbs).

