Automatic Word Spacing of the Korean Sentences by Using End-to-End Deep Neural Network


KIPS Transactions on Software and Data Engineering, Vol. 8, No. 11, pp. 441-448, Nov. 2019
https://doi.org/10.3745/KTSDE.2019.8.11.441, Full Text:
Keywords: Syllable Embedding, Bi-LSTM, Feedforward Neural Network, Neural Network Language Model, Linear-Chain CRF
Abstract

Previous researches on automatic spacing of Korean sentences has been researched to correct spacing errors by using n-gram based statistical techniques or morpheme analyzer to insert blanks in the word boundary. In this paper, we propose an end-to-end automatic word spacing by using deep neural network. Automatic word spacing problem could be defined as a tag classification problem in unit of syllable other than word. For contextual representation between syllables, Bi-LSTM encodes the dependency relationship between syllables into a fixed-length vector of continuous vector space using forward and backward LSTM cell. In order to conduct automatic word spacing of Korean sentences, after a fixed-length contextual vector by Bi-LSTM is classified into auto-spacing tag(B or I), the blank is inserted in the front of B tag. For tag classification method, we compose three types of classification neural networks. One is feedforward neural network, another is neural network language model and the other is linear-chain CRF. To compare our models, we measure the performance of automatic word spacing depending on the three of classification networks. linear-chain CRF of them used as classification neural network shows better performance than other models. We used KCC150 corpus as a training and testing data.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
H. Y. Lee and S. S. Kang, "Automatic Word Spacing of the Korean Sentences by Using End-to-End Deep Neural Network," KIPS Transactions on Software and Data Engineering, vol. 8, no. 11, pp. 441-448, 2019. DOI: https://doi.org/10.3745/KTSDE.2019.8.11.441.

[ACM Style]
Hyun Young Lee and Seung Shik Kang. 2019. Automatic Word Spacing of the Korean Sentences by Using End-to-End Deep Neural Network. KIPS Transactions on Software and Data Engineering, 8, 11, (2019), 441-448. DOI: https://doi.org/10.3745/KTSDE.2019.8.11.441.