TY - GEN
T1 - Voice conversion using speech-to-speech neuro-style transfer
AU - AlBadawy, Ehab A.
AU - Lyu, Siwei
N1 - Publisher Copyright: © 2020 ISCA
PY - 2020
Y1 - 2020
N2 - An impressionist is the one who tries to mimic other people's voices and their style of speech. Humans have mastered such a task throughout the years. In this work, we introduce a deep learning-based approach to do voice conversion with speech style transfer across different speakers. In our work, we use a combination of Variational Auto-Encoder (VAE) and Generative Adversarial Network (GAN) as the main components of our proposed model followed by a WaveNet-based vocoder. We use three objective metrics to evaluate our model using the ASVspoof 2019 for measuring the difficulty of differentiating between human and synthesized samples, content verification for transcription accuracy, and speaker encoding for identity verification. Our results show the efficacy of our proposed model in producing a high quality synthesized speech on Flickr8k audio corpus.
AB - An impressionist is the one who tries to mimic other people's voices and their style of speech. Humans have mastered such a task throughout the years. In this work, we introduce a deep learning-based approach to do voice conversion with speech style transfer across different speakers. In our work, we use a combination of Variational Auto-Encoder (VAE) and Generative Adversarial Network (GAN) as the main components of our proposed model followed by a WaveNet-based vocoder. We use three objective metrics to evaluate our model using the ASVspoof 2019 for measuring the difficulty of differentiating between human and synthesized samples, content verification for transcription accuracy, and speaker encoding for identity verification. Our results show the efficacy of our proposed model in producing a high quality synthesized speech on Flickr8k audio corpus.
KW - Speech Synthesis and Spoken Language Generation
KW - Speech-to-Speech model
KW - Voice conversion
UR - https://www.scopus.com/pages/publications/85098196545
U2 - 10.21437/Interspeech.2020-3056
DO - 10.21437/Interspeech.2020-3056
M3 - Conference contribution
SN - 9781713820697
T3 - Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
SP - 4726
EP - 4730
BT - Interspeech 2020
PB - International Speech Communication Association
T2 - 21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020
Y2 - 25 October 2020 through 29 October 2020
ER -