Joint Adversarial Training of Speech Recognition and Synthesis Models for Many-to-One Voice Conversion Using Phonetic Posteriorgrams

doi:10.1587/transinf.2019EDP7297

3 0 0 0 OA Joint Adversarial Training of Speech Recognition and Synthesis Models for Many-to-One Voice Conversion Using Phonetic Posteriorgrams

著者: Yuki SAITO Kei AKUZAWA Kentaro TACHIBANA
出版者: The Institute of Electronics, Information and Communication Engineers
雑誌: IEICE Transactions on Information and Systems (ISSN:09168532)
巻号頁・発行日: vol.E103.D, no.9, pp.1978-1987, 2020-09-01 (Released:2020-09-01)
参考文献数: 53

This paper presents a method for many-to-one voice conversion using phonetic posteriorgrams (PPGs) based on an adversarial training of deep neural networks (DNNs). A conventional method for many-to-one VC can learn a mapping function from input acoustic features to target acoustic features through separately trained DNN-based speech recognition and synthesis models. However, 1) the differences among speakers observed in PPGs and 2) an over-smoothing effect of generated acoustic features degrade the converted speech quality. Our method performs a domain-adversarial training of the recognition model for reducing the PPG differences. In addition, it incorporates a generative adversarial network into the training of the synthesis model for alleviating the over-smoothing effect. Unlike the conventional method, ours jointly trains the recognition and synthesis models so that they are optimized for many-to-one VC. Experimental evaluation demonstrates that the proposed method significantly improves the converted speech quality compared with conventional VC methods.

2020-12-04 15:59:16
3 + 15 Twitter

言及状況

外部データベース (DOI)

Twitter (3 users, 4 posts, 15 favorites)

Domain-invariant learning of PPGs is also effective in many-to-one VC

2 @oryosu @lanjice

2 @davidbarbera9 @santty128

13 @Alt_Andy @lordglacius @mfreixesg @Saikallis9012 @santty128 @r9y9 @KentaroTachiba @momiji_fullmoon @kwn_karaage @yoshipon0520 @heiga_zen @korguchi @supikiti

収集済み URL リスト

https://www.jstage.jst.go.jp/article/transinf/E103.D/9/E103.D_2019EDP7297/_article/-char/en (3)