Recognizing Unknown Words and Correcting Spelling Errors as Preprocessing for Korean Information Processing System


The Transactions of the Korea Information Processing Society (1994 ~ 2000), Vol. 5, No. 10, pp. 2591-2599, Oct. 1998
10.3745/KIPSTE.1998.5.10.2591,   PDF Download:

Abstract

In this paper, we propose a method of recognizing unknown words and correcting spelling errors(including spacing errors) to increase the performance of Korean information processing systems. Unknown words are recognized through comparative analysis of two or more morphologically similar eojeols(spacing units in Korean) including the same unknown word candidates. And spacing errors and spelling errors are corrected by using lexicalized rules which are autimatically extracted from very large raw corpus. The extraction of the lexicalized rules is based on morphological and contextual similarities between error eojeols and their correction eojeols which are confirmed to be used in the corpus. The experimental result shows that our system can recognize unknown words in an accuracy of 98.9%, and can correct spacing errors and spelling errors in accuracies of 98.1% and 97.1%, respectively.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
P. B. Rae and R. H. Chang, "Recognizing Unknown Words and Correcting Spelling Errors as Preprocessing for Korean Information Processing System," The Transactions of the Korea Information Processing Society (1994 ~ 2000), vol. 5, no. 10, pp. 2591-2599, 1998. DOI: 10.3745/KIPSTE.1998.5.10.2591.

[ACM Style]
Park Bong Rae and Rim Hae Chang. 1998. Recognizing Unknown Words and Correcting Spelling Errors as Preprocessing for Korean Information Processing System. The Transactions of the Korea Information Processing Society (1994 ~ 2000), 5, 10, (1998), 2591-2599. DOI: 10.3745/KIPSTE.1998.5.10.2591.