文献一覧: Takaaki SAEKI (著者)

15 0 0 0 OA Real-Time Full-Band Voice Conversion with Sub-Band Modeling and Data-Driven Phase Estimation of Spectral Differentials

著者: Takaaki SAEKI Yuki SAITO Shinnosuke TAKAMICHI Hiroshi SARUWATARI
出版者: The Institute of Electronics, Information and Communication Engineers
雑誌: IEICE Transactions on Information and Systems (ISSN:09168532)
巻号頁・発行日: vol.E104.D, no.7, pp.1002-1016, 2021-07-01 (Released:2021-07-01)
参考文献数: 41
被引用文献数: 2

This paper proposes two high-fidelity and computationally efficient neural voice conversion (VC) methods based on a direct waveform modification using spectral differentials. The conventional spectral-differential VC method with a minimum-phase filter achieves high-quality conversion for narrow-band (16 kHz-sampled) VC but requires heavy computational cost in filtering. This is because the minimum phase obtained using a fixed lifter of the Hilbert transform often results in a long-tap filter. Furthermore, when we extend the method to full-band (48 kHz-sampled) VC, the computational cost is heavy due to increased sampling points, and the converted-speech quality degrades due to large fluctuations in the high-frequency band. To construct a short-tap filter, we propose a lifter-training method for data-driven phase reconstruction that trains a lifter of the Hilbert transform by taking into account filter truncation. We also propose a frequency-band-wise modeling method based on sub-band multi-rate signal processing (sub-band modeling method) for full-band VC. It enhances the computational efficiency by reducing sampling points of signals converted with filtering and improves converted-speech quality by modeling only the low-frequency band. We conducted several objective and subjective evaluations to investigate the effectiveness of the proposed methods through implementation of the real-time, online, full-band VC system we developed, which is based on the proposed methods. The results indicate that 1) the proposed lifter-training method for narrow-band VC can shorten the tap length to 1/16 without degrading the converted-speech quality, and 2) the proposed sub-band modeling method for full-band VC can improve the converted-speech quality while reducing the computational cost, and 3) our real-time, online, full-band VC system can convert 48 kHz-sampled speech in real time attaining the converted speech with a 3.6 out of 5.0 mean opinion score of naturalness.

2023-11-11 13:06:45
15 + 39 Twitter

1 0 0 0 Real-Time Full-Band Voice Conversion with Sub-Band Modeling and Data-Driven Phase Estimation of Spectral Differentials

著者: Takaaki SAEKI Yuki SAITO Shinnosuke TAKAMICHI Hiroshi SARUWATARI
出版者: The Institute of Electronics, Information and Communication Engineers
雑誌: IEICE TRANSACTIONS on Information and Systems (ISSN:09168532)
巻号頁・発行日: vol.E104-D, no.7, pp.1002-1016, 2021-07-01
被引用文献数: 2

2021-07-01 08:08:58
1 + 4 Twitter