著者
Takashi NOSE Yuhei OTA Takao KOBAYASHI
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE TRANSACTIONS on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E93-D, no.9, pp.2483-2490, 2010-09-01
被引用文献数
9

We propose a segment-based voice conversion technique using hidden Markov model (HMM)-based speech synthesis with nonparallel training data. In the proposed technique, the phoneme information with durations and a quantized F0 contour are extracted from the input speech of a source speaker, and are transmitted to a synthesis part. In the synthesis part, the quantized F0 symbols are used as prosodic context. A phonetically and prosodically context-dependent label sequence is generated from the transmitted phoneme and the F0 symbols. Then, converted speech is generated from the label sequence with durations using the target speaker's pre-trained context-dependent HMMs. In the model training, the models of the source and target speakers can be trained separately, hence there is no need to prepare parallel speech data of the source and target speakers. Objective and subjective experimental results show that the segment-based voice conversion with phonetic and prosodic contexts works effectively even if the parallel speech data is not available.
著者
Kosuke Nakamura Takashi Nose Yuya Chiba Akinori Ito
出版者
Information Processing Society of Japan
雑誌
Journal of Information Processing (ISSN:18826652)
巻号頁・発行日
vol.28, pp.248-257, 2020 (Released:2020-04-15)
参考文献数
33

In this paper, we deal with melody completion, a technique which smoothly completes partially-masked melodies. Melody completion can be used to help people compose or arrange pieces of music in several ways, such as editing existing melodies or connecting two other melodies. In recent years, various methods have been proposed for realizing high-quality completion via neural networks. Therefore, in this research, we examine a method of melody completion based on an image completion network. We represent melodies as images and train a completion network to complete those images. The completion network consists of convolution layers and is trained in the framework of generative adversarial networks. We also consider chord progression from musical pieces as conditions. From the experimental result, it was confirmed that the network could generate original melody as a completion result and the quality of the generated melody was not significantly worse than the result of a simple example-based melody completion method.