Improving Fidelity of Synthesized Voices Generated by Using GANs


KIPS Transactions on Software and Data Engineering, Vol. 10, No. 1, pp. 9-18, Jan. 2021
https://doi.org/10.3745/KTSDE.2021.10.1.9,   PDF Download:  
Keywords: generative adversarial networks, Fréchet Inception Distance, Fidelity Improvement, Synthesized Voice
Abstract

Although Generative Adversarial Networks (GANs) have gained great popularity in computer vision and related fields, generating audio signals independently has yet to be presented. Unlike images, an audio signal is a sampled signal consisting of discrete samples, so it is not easy to learn the signals using CNN architectures, which is widely used in image generation tasks. In order to overcome this difficulty, GAN researchers proposed a strategy of applying time-frequency representations of audio to existing image-generating GANs. Following this strategy, we propose an improved method for increasing the fidelity of synthesized audio signals generated by using GANs. Our method is demonstrated on a public speech dataset, and evaluated by Fréchet Inception Distance (FID). When employing our method, the FID showed 10.504, but 11.973 as for the existing state of the art method (lower FID indicates better fidelity).


Statistics
Show / Hide Statistics

Statistics (Cumulative Counts from September 1st, 2017)
Multiple requests among the same browser session are counted as one view.
If you mouse over a chart, the values of data points will be shown.


Cite this article
[IEEE Style]
M. Back, S. Yoon, S. Lee, K. Lee, "Improving Fidelity of Synthesized Voices Generated by Using GANs," KIPS Transactions on Software and Data Engineering, vol. 10, no. 1, pp. 9-18, 2021. DOI: https://doi.org/10.3745/KTSDE.2021.10.1.9.

[ACM Style]
Moon-Ki Back, Seung-Won Yoon, Sang-Baek Lee, and Kyu-Chul Lee. 2021. Improving Fidelity of Synthesized Voices Generated by Using GANs. KIPS Transactions on Software and Data Engineering, 10, 1, (2021), 9-18. DOI: https://doi.org/10.3745/KTSDE.2021.10.1.9.