著者
Daisuke SAITO Nobuaki MINEMATSU Keikichi HIROSE
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE Transactions on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E103.D, no.6, pp.1395-1405, 2020-06-01 (Released:2020-06-01)
参考文献数
28

This paper describes a novel approach to flexible control of speaker characteristics using tensor representation of multiple Gaussian mixture models (GMM). In voice conversion studies, realization of conversion from/to an arbitrary speaker's voice is one of the important objectives. For this purpose, eigenvoice conversion (EVC) based on an eigenvoice GMM (EV-GMM) was proposed. In the EVC, a speaker space is constructed based on GMM supervectors which are high-dimensional vectors derived by concatenating the mean vectors of each of the speaker GMMs. In the speaker space, each speaker is represented by a small number of weight parameters of eigen-supervectors. In this paper, we revisit construction of the speaker space by introducing the tensor factor analysis of training data set. In our approach, each speaker is represented as a matrix of which the row and the column respectively correspond to the dimension of the mean vector and the Gaussian component. The speaker space is derived by the tensor factor analysis of the set of the matrices. Our approach can solve an inherent problem of supervector representation, and it improves the performance of voice conversion. In addition, in this paper, effects of speaker adaptive training before factorization are also investigated. Experimental results of one-to-many voice conversion demonstrate the effectiveness of the proposed approach.
著者
Masahiko Imai Tomohiro Izumisawa Daisuke Saito Shinya Hasegawa Masahiro Yamasaki Noriko Takahashi
出版者
The Pharmaceutical Society of Japan
雑誌
Biological and Pharmaceutical Bulletin (ISSN:09186158)
巻号頁・発行日
vol.46, no.5, pp.661-671, 2023-05-01 (Released:2023-05-01)
参考文献数
21

Myelosuppression, a side effect of anticancer drugs, makes people more susceptible to infectious diseases by compromising the immune system. When a cancer patient develops a contagious disease, treatment with an anticancer drug is suspended or postponed to treat the infectious disease. If there was a drug that suppresses the growth of cancer cells among antibacterial agents, it would be possible to treat both infectious diseases and cancer. Therefore, this study investigated the effect of antibacterial agents on cancer cell development. Vancomycin (VAN) had little effect on cell proliferation against the breast cancer cell, MCF-7, prostate cancer cell, PC-3, and gallbladder cancer cell, NOZ C-1. Alternatively, Teicoplanin (TEIC) and Daptomycin (DAP) promoted the growth of some cancer cells. In contrast, Linezolid (LZD) suppressed the proliferation of MCF-7, PC-3, and NOZ C-1 cells. Therefore, we found a drug that affects the growth of cancer cells among antibacterial agents. Next, when we examined the effects of the combined use of existing anticancer and antibacterial agents, we found VAN did not affect the growth suppression by anticancer agents. However, TEIC and DAP attenuated the growth suppression of anticancer agents. In contrast, LZD additively enhanced the growth suppression by Docetaxel in PC-3 cells. Furthermore, we showed that LZD inhibits cancer cell growth by mechanisms that involve phosphatidylinositol 3-kinase (PI3K)/protein kinase B (Akt) pathway suppression. Therefore, LZD might simultaneously treat cancer and infectious diseases.
著者
Hitoshi SUDA Gaku KOTANI Daisuke SAITO
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE TRANSACTIONS on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E105-D, no.6, pp.1196-1210, 2022-06-01
被引用文献数
1

In this paper, we propose a new training framework named the INmfCA algorithm for nonparallel voice conversion (VC) systems. To train conversion models, traditional VC frameworks require parallel corpora, in which source and target speakers utter the same linguistic contents. Although the frameworks have achieved high-quality VC, they are not applicable in situations where parallel corpora are unavailable. To acquire conversion models without parallel corpora, nonparallel methods are widely studied. Although the frameworks achieve VC under nonparallel conditions, they tend to require huge background knowledge or many training utterances. This is because of difficulty in disentangling linguistic and speaker information without a large amount of data. In this work, we tackle this problem by exploiting NMF, which can factorize acoustic features into time-variant and time-invariant components in an unsupervised manner. The method acquires alignment between the acoustic features of a source speaker's utterances and a target dictionary and uses the obtained alignment as activation of NMF to train the source speaker's dictionary without parallel corpora. The acquisition method is based on the INCA algorithm, which obtains the alignment of nonparallel corpora. In contrast to the INCA algorithm, the alignment is not restricted to observed samples, and thus the proposed method can efficiently utilize small nonparallel corpora. The results of subjective experiments show that the combination of the proposed algorithm and the INCA algorithm outperformed not only an INCA-based nonparallel framework but also CycleGAN-VC, which performs nonparallel VC without any additional training data. The results also indicate that a one-shot VC framework, which does not need to train source speakers, can be constructed on the basis of the proposed method.
著者
Xinyi Zhao Nobuaki Minematsu Daisuke Saito
雑誌
研究報告音声言語情報処理(SLP) (ISSN:21888663)
巻号頁・発行日
vol.2018-SLP-125, no.17, pp.1-4, 2018-12-03

In English education, speech synthesis technologies can be effectively used to develop a reading tutor to show students how to read given sentences in a natural and native way. The tutor can not only provide native-like audio of the input sentences but also visualize required prosodic structure to read those sentences aloud naturally. As the first step to develop such a reading tutor, prosodic events that can imply the intonation of the sentence need to be predicted from plain text. In this research, phrase boundary and 4-level stress instead of the traditional binary stress level are taken into consideration as prosodic events. 4-level stress labels not only categorize syllables into stressed ones and unstressed ones, but also indicate where phrase stress and sentence stress should appear in a sentence. Conditional Random Fields as a popular sequence labeling method are employed to do the prediction work. Experiments showed that applying our proposed method can improve the performance of prosody prediction compared to previous researches.