著者
Graham NEUBIG Masato MIMURA Shinsuke MORI Tatsuya KAWAHARA
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
IEICE Transactions on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E95.D, no.2, pp.614-625, 2012-02-01 (Released:2012-02-01)
参考文献数
40
被引用文献数
11 24 6

We propose a novel scheme to learn a language model (LM) for automatic speech recognition (ASR) directly from continuous speech. In the proposed method, we first generate phoneme lattices using an acoustic model with no linguistic constraints, then perform training over these phoneme lattices, simultaneously learning both lexical units and an LM. As a statistical framework for this learning problem, we use non-parametric Bayesian statistics, which make it possible to balance the learned model's complexity (such as the size of the learned vocabulary) and expressive power, and provide a principled learning algorithm through the use of Gibbs sampling. Implementation is performed using weighted finite state transducers (WFSTs), which allow for the simple handling of lattice input. Experimental results on natural, adult-directed speech demonstrate that LMs built using only continuous speech are able to significantly reduce ASR phoneme error rates. The proposed technique of joint Bayesian learning of lexical units and an LM over lattices is shown to significantly contribute to this improvement.
著者
Akinobu Lee Tatsuya Kawahara Kiyohiro Shikano
巻号頁・発行日
pp.1691-1694, 2001-09

Julius is a high-performance, two-pass LVCSR decoder for researchers and developers. Based on word 3-gram and context-dependent HMM, it can perform almost real-time decoding on most current PCs in 20k word dictation task. Major search techniques are fully incorporated such as tree lexicon, N-gram factoring, cross-word context dependency handling, enveloped beam search, Gaussian pruning, Gaussian selection, etc. Besides search efficiency, it is also modularized carefully to be independent from model structures, and various HMM types are supported such as shared-state triphones and tied-mixture models, with any number of mixtures, states, or phones. Standard formats are adopted to cope with other free modeling toolkit. The main platform is Linux and other Unix workstations, and partially works on Windows. Julius is distributed with open license together with source codes, and has been used by many researchers and developers in Japan.
著者
Toshiyuki Hagiya Toshiharu Horiuchi Tomonori Yazaki Tatsuya Kawahara
雑誌
情報処理学会論文誌 (ISSN:18827764)
巻号頁・発行日
vol.59, no.4, 2018-04-15

Many older adults are interested in smartphones. However, most of them encounter difficulties in self-instruction and need support. Text entry, which is essential for various applications, is one of the most difficult operations to master. In this paper, we propose Typing Tutor, an individualized tutoring system for text entry that detects input stumbles using a statistical approach and provides instructions. By conducting two user studies, we clarify the common difficulties that novice older adults experience and how skill level is related to input stumbles with a 12-key layout for Japanese. Based on the study, we develop Typing Tutor to support learning how to enter text on a smartphone. A two-week evaluation experiment with novice older adults (65+) showed that Typing Tutor was effective in improving their text entry proficiency, especially in the initial stage of use. In addition, we demonstrate the applicability of Typing Tutor to other keyboards and languages with the QWERTY layout for Japanese and English.------------------------------This is a preprint of an article intended for publication Journal ofInformation Processing(JIP). This preprint should not be cited. Thisarticle should be cited as: Journal of Information Processing Vol.26(2018) (online)DOI http://dx.doi.org/10.2197/ipsjjip.26.362------------------------------
著者
Sheng LI Yuya AKITA Tatsuya KAWAHARA
出版者
一般社団法人 電子情報通信学会
雑誌
IEICE Transactions on Information and Systems (ISSN:09168532)
巻号頁・発行日
vol.E98.D, no.8, pp.1545-1552, 2015-08-01 (Released:2015-08-01)
参考文献数
29

The paper addresses a scheme of lightly supervised training of an acoustic model, which exploits a large amount of data with closed caption texts but not faithful transcripts. In the proposed scheme, a sequence of the closed caption text and that of the ASR hypothesis by the baseline system are aligned. Then, a set of dedicated classifiers is designed and trained to select the correct one among them or reject both. It is demonstrated that the classifiers can effectively filter the usable data for acoustic model training. The scheme realizes automatic training of the acoustic model with an increased amount of data. A significant improvement in the ASR accuracy is achieved from the baseline system and also in comparison with the conventional method of lightly supervised training based on simple matching.