著者
Xinyi Zhao Nobuaki Minematsu Daisuke Saito
雑誌
研究報告音声言語情報処理(SLP) (ISSN:21888663)
巻号頁・発行日
vol.2018-SLP-125, no.17, pp.1-4, 2018-12-03

In English education, speech synthesis technologies can be effectively used to develop a reading tutor to show students how to read given sentences in a natural and native way. The tutor can not only provide native-like audio of the input sentences but also visualize required prosodic structure to read those sentences aloud naturally. As the first step to develop such a reading tutor, prosodic events that can imply the intonation of the sentence need to be predicted from plain text. In this research, phrase boundary and 4-level stress instead of the traditional binary stress level are taken into consideration as prosodic events. 4-level stress labels not only categorize syllables into stressed ones and unstressed ones, but also indicate where phrase stress and sentence stress should appear in a sentence. Conditional Random Fields as a popular sequence labeling method are employed to do the prediction work. Experiments showed that applying our proposed method can improve the performance of prosody prediction compared to previous researches.