A Study on Korean Speech Animation Generation Employing Deep Learning


KIPS Transactions on Software and Data Engineering, Vol. 12, No. 10, pp. 461-470, Oct. 2023
https://doi.org/10.3745/KTSDE.2023.12.10.461,   PDF Download:
Keywords: Speech animation, Deep Learning, Viseme, Co-articulation, Blendshape
Abstract

While speech animation generation employing deep learning has been actively researched for English, there has been no prior work for Korean. Given the fact, this paper for the very first time employs supervised deep learning to generate Korean speech animation. By doing so, we find out the significant effect of deep learning being able to make speech animation research come down to speech recognition research which is the predominating technique. Also, we study the way to make best use of the effect for Korean speech animation generation. The effect can contribute to efficiently and efficaciously revitalizing the recently inactive Korean speech animation research, by clarifying the top priority research target. This paper performs this process: (i) it chooses blendshape animation technique, (ii) implements the deep-learning model in the master-servant pipeline of the automatic speech recognition (ASR) module and the facial action coding (FAC) module, (iii) makes Korean speech facial motion capture dataset, (iv) prepares two comparison deep learning models (one model adopts the English ASR module, the other model adopts the Korean ASR module, however both models adopt the same basic structure for their FAC modules), and (v) train the FAC modules of both models dependently on their ASR modules. The user study demonstrates that the model which adopts the Korean ASR module and dependently trains its FAC module (getting 4.2/5.0 points) generates decisively much more natural Korean speech animations than the model which adopts the English ASR module and dependently trains its FAC module (getting 2.7/5.0 points). The result confirms the aforementioned effect showing that the quality of the Korean speech animation comes down to the accuracy of Korean ASR.


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
S. C. Kang and D. J. Kim, "A Study on Korean Speech Animation Generation Employing Deep Learning," KIPS Transactions on Software and Data Engineering, vol. 12, no. 10, pp. 461-470, 2023. DOI: https://doi.org/10.3745/KTSDE.2023.12.10.461.

[ACM Style]
Suk Chan Kang and Dong Ju Kim. 2023. A Study on Korean Speech Animation Generation Employing Deep Learning. KIPS Transactions on Software and Data Engineering, 12, 10, (2023), 461-470. DOI: https://doi.org/10.3745/KTSDE.2023.12.10.461.