著者
Yosuke Uto Yoshihiko Nankaku Tomoki Toda Akinobu Lee Keiichi Tokuda
巻号頁・発行日
pp.2278-2281, 2006-09

This paper describes the voice conversion based on the Mixtures of Factor Analyzers (MFA) which can provide an efficient modeling with a limited amount of training data. As a typical spectral conversion method, a mapping algorithm based on the Gaussian Mixture Model (GMM) has been proposed. In this method two kinds of covariance matrix structures are often used : the diagonal and full covariance matrices. GMM with diagonal covariance matrices requires a large number of mixture components for accurately estimating spectral features. On the other hand, GMM with full covariance matrices needs sufficient training data to estimate model parameters. In order to cope with these problems, we apply MFA to voice conversion. MFA can be regarded as intermediate model between GMM with diagonal covariance and with full covariance. Experimental results show that MFA can improve the conversion accuracy compared with the conventional GMM.
著者
Shogo SEKI Tomoki TODA Kazuya TAKEDA
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE TRANSACTIONS on Fundamentals of Electronics, Communications and Computer Sciences (ISSN:09168508)
巻号頁・発行日
vol.E101-A, no.7, pp.1057-1064, 2018-07-01

This paper proposes a semi-supervised source separation method for stereophonic music signals containing multiple recorded or processed signals, where synthesized music is focused on the stereophonic music. As the synthesized music signals are often generated as linear combinations of many individual source signals and their respective mixing gains, phase or phase difference information between inter-channel signals, which represent spatial characteristics of recording environments, cannot be utilized as acoustic clues for source separation. Non-negative Tensor Factorization (NTF) is an effective technique which can be used to resolve this problem by decomposing amplitude spectrograms of stereo channel music signals into basis vectors and activations of individual music source signals, along with their corresponding mixing gains. However, it is difficult to achieve sufficient separation performance using this method alone, as the acoustic clues available for separation are limited. To address this issue, this paper proposes a Cepstral Distance Regularization (CDR) method for NTF-based stereo channel separation, which involves making the cepstrum of the separated source signals follow Gaussian Mixture Models (GMMs) of the corresponding the music source signal. These GMMs are trained in advance using available samples. Experimental evaluations separating three and four sound sources are conducted to investigate the effectiveness of the proposed method in both supervised and semi-supervised separation frameworks, and performance is also compared with that of a conventional NTF method. Experimental results demonstrate that the proposed method yields significant improvements within both separation frameworks, and that cepstral distance regularization provides better separation parameters.
著者
Yoshihiko Nankaku Kenichi Nakamura Tomoki Toda Keiichi Tokuda
巻号頁・発行日
2007-08

This paper proposes a spectral conversion technique based on a new statistical model which includes time-sequence matching. In conventional GMM-based approaches, the Dynamic Programming (DP) matching between source and target feature sequences is performed prior to the training of GMMs. Although a similarity measure of two frames, e.g., the Euclid distance is typically adopted, this might be inappropriate for converting the spectral features. The likelihood function of the proposed model can directly deal with two different length sequences, in which a frame alignment of source and target feature sequences is represented by discrete hidden variables. In the proposed algorithm, the maximum likelihood criterion is consistently applied to the training of model parameters, sequence matching and spectral conversion. In the subjective preference test, the proposed method is superior than the conventional GMM-based method.