著者
Junya KOGUCHI Shinnosuke TAKAMICHI Masanori MORISE Hiroshi SARUWATARI Shigeki SAGAYAMA
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE TRANSACTIONS on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E103-D, no.12, pp.2673-2681, 2020-12-01
被引用文献数
2

We propose a speech analysis-synthesis and deep neural network (DNN)-based text-to-speech (TTS) synthesis framework using Gaussian mixture model (GMM)-based approximation of full-band spectral envelopes. GMMs have excellent properties as acoustic features in statistic parametric speech synthesis. Each Gaussian function of a GMM fits the local resonance of the spectrum. The GMM retains the fine spectral envelope and achieve high controllability of the structure. However, since conventional speech analysis methods (i.e., GMM parameter estimation) have been formulated for a narrow-band speech, they degrade the quality of synthetic speech. Moreover, a DNN-based TTS synthesis method using GMM-based approximation has not been formulated in spite of its excellent expressive ability. Therefore, we employ peak-picking-based initialization for full-band speech analysis to provide better initialization for iterative estimation of the GMM parameters. We introduce not only prediction error of GMM parameters but also reconstruction error of the spectral envelopes as objective criteria for training DNN. Furthermore, we propose a method for multi-task learning based on minimizing these errors simultaneously. We also propose a post-filter based on variance scaling of the GMM for our framework to enhance synthetic speech. Experimental results from evaluating our framework indicated that 1) the initialization method of our framework outperformed the conventional one in the quality of analysis-synthesized speech; 2) introducing the reconstruction error in DNN training significantly improved the synthetic speech; 3) our variance-scaling-based post-filter further improved the synthetic speech.
著者
Christoph M. Wilk Shigeki Sagayama
出版者
Information Processing Society of Japan
雑誌
Journal of Information Processing (ISSN:18826652)
巻号頁・発行日
vol.27, pp.693-700, 2019 (Released:2019-11-15)
参考文献数
30
被引用文献数
1

This paper proposes automatic music completion - the automatic generation of music pieces from any incomplete fragments of music - as a new class of music composition assistance tasks. This is a generalization of conventional music information problems such as automatic melody generation and harmonization. The goal is to turn musical ideas of a user into music pieces, allowing users to quickly explore new ideas and enabling inexperienced users to create their own music. This principle is applicable to a wide variety of music, and as a first step, we present a system that automatically fills in missing parts of a four-part chorale, as well as the underlying harmony progression. The user can input any combination of melody fragments, and freely constrain the harmony. Our system searches for harmonies and melodies that adhere to music-theoretical principles, which requires extensive knowledge and practice for human composers. Accounting for the mutual influence of melodic and harmonic development in music composition, the system is based on a joint model of harmony and voicing. The system was evaluated by analyzing generated music with respect to music theory, in addition to a subjective evaluation experiment. The readers are invited to experiment with our system at http://160.16.202.131/music_completion.