Target Word Selection Disambiguation using Untagged Text Data in English-Korean Machine Translation


The KIPS Transactions:PartB , Vol. 11, No. 6, pp. 749-758, Oct. 2004
10.3745/KIPSTB.2004.11.6.749,   PDF Download:

Abstract

In this paper, we propose a new method utilizing only raw corpus without additional human effort for disambiguation of target word selection in English-Korean machine translation. We use two data-driven techniques; one is the Latent Semantic Analysis(LSA) and the other the Probabilistic Latent Semantic Analysis(PLSA). These two techniques can represent complex semantic structures in given contexts like text passages. We construct linguistic semantic knowledge by using the two techniques and use the knowledge for target word selection in English-Korean machine translation. For target word selection, we utilize a grammatical relationship stored in a dictionary. We usenearest neighbor learning algorithm for the resolution of data sparseness problem in target word selection and estimate the distance between instances based on these models. In experiments, we use TREC data of AP news for construction of latent semantic space and Wall Street Journal corpus for evaluation of target word selection. Through the Latent Semantic Analysis mothods, the accuracy of target word selection has improved over 10% and PLSA has showed better accuracy than LSA method. Finally we have showed the relatedness between the accuracy and two important factors ; one is dimensionality of latent space andvalue ofNN learning by using correlation calculation.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
Y. S. Kim and J. H. Chang, "Target Word Selection Disambiguation using Untagged Text Data in English-Korean Machine Translation," The KIPS Transactions:PartB , vol. 11, no. 6, pp. 749-758, 2004. DOI: 10.3745/KIPSTB.2004.11.6.749.

[ACM Style]
Yu Seop Kim and Jeong Ho Chang. 2004. Target Word Selection Disambiguation using Untagged Text Data in English-Korean Machine Translation. The KIPS Transactions:PartB , 11, 6, (2004), 749-758. DOI: 10.3745/KIPSTB.2004.11.6.749.