著者
Kento MATSUMOTO Sunao HARA Masanobu ABE
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE TRANSACTIONS on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E105-D, no.9, pp.1581-1589, 2022-09-01
被引用文献数
1

In this paper, we propose a new algorithm to generate Speech-like Emotional Sound (SES). Emotional expressions may be the most important factor in human communication, and speech is one of the most useful means of expressing emotions. Although speech generally conveys both emotional and linguistic information, we have undertaken the challenge of generating sounds that convey emotional information alone. We call the generated sounds “speech-like,” because the sounds do not contain any linguistic information. SES can provide another way to generate emotional response in human-computer interaction systems. To generate “speech-like” sound, we propose employing WaveNet as a sound generator conditioned only by emotional IDs. This concept is quite different from the WaveNet Vocoder, which synthesizes speech using spectrum information as an auxiliary feature. The biggest advantage of our approach is that it reduces the amount of emotional speech data necessary for training by focusing on non-linguistic information. The proposed algorithm consists of two steps. In the first step, to generate a variety of spectrum patterns that resemble human speech as closely as possible, WaveNet is trained with auxiliary mel-spectrum parameters and Emotion ID using a large amount of neutral speech. In the second step, to generate emotional expressions, WaveNet is retrained with auxiliary Emotion ID only using a small amount of emotional speech. Experimental results reveal the following: (1) the two-step training is necessary to generate the SES with high quality, and (2) it is important that the training use a large neutral speech database and spectrum information in the first step to improve the emotional expression and naturalness of SES.

言及状況

外部データベース (DOI)

Twitter (1 users, 1 posts, 1 favorites)

Speech-Like Emotional Sound Generation Using WaveNet https://t.co/bIFSchd6FQ

収集済み URL リスト