著者
Xin Wang Shinji Takaki Junichi Yamagishi
雑誌
研究報告音声言語情報処理(SLP) (ISSN:21888663)
巻号頁・発行日
vol.2017-SLP-115, no.2, pp.1-6, 2017-02-10

Neural-network-based mixture density networks are important tools for acoustic modeling in statistical parametric speech synthesis. Recently we found that incorporating an autoregressive model in a recurrent mixture density network, which is referred to as AR-RMDN, enabled the network to generate quite smooth acoustic data trajectories without using the delta and delta-delta coefficients. More interestingly, the new model generated trajectories with a dynamic range similar to that of the natural data, thus alleviating over-smoothing effect. In this work, after explaining the AR-RMDN from the perspective of signal and filter, we compare one AR-RMDN with a modulation-spectrum-based post-filtering method that also eases the over-smoothing effect. It is demonstrated that the AR-RMDN also alters the modulation spectrum of the generated data trajectories but in a different way from the post-filtering method. The AR-RMDN also generates synthetic speech with better perceived quality. Based on the signal and filter interpretation, we further extend the AR-RMDN so that the inverse AR filter can acquire complex poles and stay stable.

言及状況

Twitter (2 users, 2 posts, 1 favorites)

@r9y9 了解しました. ちょうどNIIのpaper(https://t.co/D3QxtbvBne)を読んでいてGV/MSとはなんぞや…となっていたので疑問が一つ解決しました. 勉強させていただきます.

収集済み URL リスト