CRNN-Based Korean Phoneme Recognition Model with CTC Algorithm


KIPS Transactions on Software and Data Engineering, Vol. 8, No. 3, pp. 115-122, Mar. 2019
https://doi.org/10.3745/KTSDE.2019.8.3.115,   PDF Download:
Keywords: Phoneme Recognition, CTC Algorithm, Convolutional Neural Network, Recurrent Neural Network
Abstract

For Korean phoneme recognition, Hidden Markov-Gaussian Mixture model(HMM-GMM) or hybrid models which combine artificial neural network with HMM have been mainly used. However, current approach has limitations in that such models require force-aligned corpus training data that is manually annotated by experts. Recently, researchers used neural network based phoneme recognition model which combines recurrent neural network(RNN)-based structure with connectionist temporal classification(CTC) algorithm to overcome the problem of obtaining manually annotated training data. Yet, in terms of implementation, these RNN-based models have another difficulty in that the amount of data gets larger as the structure gets more sophisticated. This problem of large data size is particularly problematic in the Korean language, which lacks refined corpora. In this study, we introduce CTC algorithm that does not require force-alignment to create a Korean phoneme recognition model. Specifically, the phoneme recognition model is based on convolutional neural network(CNN) which requires relatively small amount of data and can be trained faster when compared to RNN based models. We present the results from two different experiments and a resulting best performing phoneme recognition model which distinguishes 49 Korean phonemes. The best performing phoneme recognition model combines CNN with 3hop Bidirectional LSTM with the final Phoneme Error Rate(PER) at 3.26. The PER is a considerable improvement compared to existing Korean phoneme recognition models that report PER ranging from 10 to 12.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
H. Yoonseok, K. Kyungseo, G. Gahgene, "CRNN-Based Korean Phoneme Recognition Model with CTC Algorithm," KIPS Transactions on Software and Data Engineering, vol. 8, no. 3, pp. 115-122, 2019. DOI: https://doi.org/10.3745/KTSDE.2019.8.3.115.

[ACM Style]
Hong Yoonseok, Ki Kyungseo, and Gweon Gahgene. 2019. CRNN-Based Korean Phoneme Recognition Model with CTC Algorithm. KIPS Transactions on Software and Data Engineering, 8, 3, (2019), 115-122. DOI: https://doi.org/10.3745/KTSDE.2019.8.3.115.