Skip to main navigation Skip to search Skip to main content

Voice conversion using speech-to-speech neuro-style transfer

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

20 Scopus citations

Abstract

An impressionist is the one who tries to mimic other people's voices and their style of speech. Humans have mastered such a task throughout the years. In this work, we introduce a deep learning-based approach to do voice conversion with speech style transfer across different speakers. In our work, we use a combination of Variational Auto-Encoder (VAE) and Generative Adversarial Network (GAN) as the main components of our proposed model followed by a WaveNet-based vocoder. We use three objective metrics to evaluate our model using the ASVspoof 2019 for measuring the difficulty of differentiating between human and synthesized samples, content verification for transcription accuracy, and speaker encoding for identity verification. Our results show the efficacy of our proposed model in producing a high quality synthesized speech on Flickr8k audio corpus.

Original languageEnglish
Title of host publicationInterspeech 2020
PublisherInternational Speech Communication Association
Pages4726-4730
Number of pages5
ISBN (Print)9781713820697
DOIs
StatePublished - 2020
Event21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020 - Shanghai, China
Duration: Oct 25 2020Oct 29 2020

Publication series

NameProceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH
Volume2020-October

Conference

Conference21st Annual Conference of the International Speech Communication Association, INTERSPEECH 2020
Country/TerritoryChina
CityShanghai
Period10/25/2010/29/20

Keywords

  • Speech Synthesis and Spoken Language Generation
  • Speech-to-Speech model
  • Voice conversion

Fingerprint

Dive into the research topics of 'Voice conversion using speech-to-speech neuro-style transfer'. Together they form a unique fingerprint.

Cite this