文献一覧: 李晃伸 (著者)

6 0 0 0 擬人化音声対話エージェントツールキットGalatea

著者: 嵯峨山茂樹川本真一下平博新田恒雄西本卓也中村哲伊藤克亘森島繁生四倉達夫甲斐充彦李晃伸山下洋一小林隆夫徳田恵一広瀬啓吉峯松信明山田篤伝康晴宇津呂武仁
出版者: 一般社団法人情報処理学会
雑誌: 情報処理学会研究報告音声言語情報処理(SLP) (ISSN:09196072)
巻号頁・発行日: vol.2003, no.14, pp.57-64, 2003-02-07
参考文献数: 24
被引用文献数: 42

筆者らが開発した擬人化音声対話エージェントのツールキット``Galatea''についてその概要を述べる。主要な機能は音声認識、音声合成、顔画像合成であり、これらの機能を統合して、対話制御の下で動作させるものである。研究のプラットフォームとして利用されることを想定してカスタマイズ可能性を重視した結果、顔画像が容易に交換可能で、音声合成が話者適応可能で、対話制御の記述変更が容易で、更にこれらの機能モジュール自体を別のモジュールに差し替えることが容易であり、かつ処理ハードウェアの個数に柔軟に対処できるなどの特徴を持つシステムとなった。この成果はソース公開し、一般に無償使用許諾する予定である。This paper describes the outline of "Galatea," a software toolkit of anthropomorphic spoken dialog agent developed by the authors. Major functions such as speech recognition, speech synthesis and face animation generation are integrated and controlled under a dialog control. To emphasize customizability as the dialog research platform, this system features easily replaceable face, speaker-adaptive speech synthesis, easily modification of dialog control script, exchangeable function modules, and multi-processor capability. This toolkit is to be released shortly to prospective users with an open-source and license-free policy.

2020-12-13 14:36:57
6 + 0 Twitter

https://ci.nii.ac.jp/naid/110002913780

5 0 0 0 実環境における子供音声認識のための音韻モデルおよび教師なし話者適応の評価

著者: 鮫島充ランディゴメス李晃伸猿渡洋鹿野清宏
出版者: 一般社団法人情報処理学会
雑誌: 情報処理学会論文誌 (ISSN:18827764)
巻号頁・発行日: vol.47, no.7, pp.2295-2304, 2006-07-15
被引用文献数: 2

子供の音声は,声道長や基本周波数が成人音声と異なるだけでなく,発声スタイルが自発的で年齢や個人による声の変動も大きいため,通常の成人用音韻モデルでは認識精度が著しく劣化する.また,子供が正確に文章を読み上げることは多大な労力が必要であり,大規模な整った音声データベースの作成が難しい.本研究では,子供の自発的な発話の高精度な認識を目指して,音声情報案内システムによる子供の実音声の大規模収集,年齢層別子供用音韻モデルの構築と評価,および教師なし話者適応の検討を行った.大語彙連続音声認識実験より,実環境で収集した子供音声を用いることで,単語認識精度が71.1%と既存の読み上げ音声モデルに比べて絶対値で23.9%の改善が得られた.また,年齢層別の傾向では,特に幼児の音声において年齢層依存モデルによる大幅な精度改善が見られた.次に,自動収集した話者ラベルなしの大量データに対する,自動話者クラスタリングを用いた十分統計量に基づく教師なし話者適応を提案した.提案法により59 966個の発話データをクラスタリングし,近傍話者クラスタを用いて音韻モデルを適応することで,クラスタ数200の条件において,年齢層依存モデルに対してさらに幼児で2.2%,低学年子供で1.7%,高学年子供で0.5%の認識性能の改善が得られた.Child's utterance has totally different property from adult's speech, not only by their acoustic property, but by their incorrect pronunciation and totally ill-formed speaking style. The rapid physiological changes during the growth also prevent accurate speech recognition using a single model. However, collection of child's read speech is difficult in natural, since forcing them to read a sentence precisely will make the utterances far from spontaneous one. In this research, we evaluated acoustic models and an unsupervised adaptation method based on a large number of real spontaneous child speech automatically collected through an actual spoken dialogue system. Acoustic model trained by an actual spontaneous speech achieves the word accuracy of 71.1%, which outperforms one trained by read speech by 23.9%. Detailed investigation is carried out for child's ages (infant pupils, lower-grade elementary schoolers and higher-grade elementary schoolers), and accuracy of the infant pupils was greatly improved by using the age-dependent model. Then a speaker clustering method is proposed to perform unsupervised speaker adaptation based on HMM Sufficient Statistics on automatically collected database where no user tag is available. Clustering the 59,966 utterances to 200 speaker clusters, and selecting the neighbor one for each input to construct the adapted model has resulted in a further improvement of recognition accuracy by 1.5% as compared with age-class dependent models.

2015-04-19 17:15:10
5 + 2 Twitter

https://ci.nii.ac.jp/naid/110004751182

2 0 0 0 IR 相補的バックオフを用いた言語モデル融合ツールの構築

著者: 長友健太郎西村竜一小松久美子黒田由香李晃伸猿渡洋鹿野清宏
出版者: 電子情報通信学会
雑誌: 情報処理学会論文誌 (ISSN:18827764)
巻号頁・発行日: vol.43, no.9, pp.2884-2893, 2002-09-15
参考文献数: 15
被引用文献数: 23

高精度な言語モデルの融合手法として,相補的バックオフアルゴリズムに基づく融合アルゴリズムを提案するとともに,それを用いた言語モデルの融合ツールを構築した.N-gram言語モデルは,学習元のコーパスの話題や知識,語調や発話様式などの特徴を反映する.そのため,タスクごとの特徴を反映した複数の言語モデルを融合することで,より多様な入力に対処できるモデルを構築できる.この言語モデルの融合において,既存の融合手法では,モデルの持つ特性が損なわれるためタスクに対する特徴がぼやけてしまう.また,従来手法である学習元コーパスの単純な結合および再学習による融合を行うためには,学習元のコーパス自体が必要になる.これに対して,他方のモデルには現れない未観測N-gramの生起確率を他方のモデルから相互に推定する高精度な相補的バックオフアルゴリズムを提案する.さらに本手法を用いて,学習元コーパスが不要で利便性の高い言語モデル融合ツールを構築した.実際に医療相談,グルメ・レシピ検索および新聞記事の各タスクの言語モデルを融合し,それらを評価した結果,各モデルの特性をなるべく保存しながら,コーパス結合モデルと比較しても精度が劣化しないモデルを得ることができた.A new complemental back-off algorithm for merging two N-gram languagemodels is proposed. By merging several topic-dependent orstyle-dependent models, we can construct a general model that coverswider range of topics easily. However, a conventional method thatsimply concatenates the training corpora or interpolating eachprobabilities often levels off the task-dependent characteristics in each languagemodels, and weaken the linguistic constraint in total. We propose anew back-off scheme that assigns the unseen N-gram probabilitiesaccording to the probabilities of the another model. It can assignmore reliable probabilities to the unseen N-grams, and no originalcorpora is needed for the merging. We implemented a command tool thatrealizes this method, and evaluated it on three recognition tasks(medical consulting, food recipe query and newspaper article). The results reveal that our merged model can keep the same accuracy of each original one.

2014-07-17 21:00:08
2 + 1 Twitter

https://ci.nii.ac.jp/naid/110002726500

2 0 0 0 SuperHマイコンへの搭載を目的とした連続音声認識ソフトウェアJuliusの計算量削減

著者: 小窪浩明畑岡信夫李晃伸河原達也鹿野清宏
雑誌: 情報処理学会論文誌 (ISSN:18827764)
巻号頁・発行日: vol.50, no.11, pp.2597-2606, 2009-11-15
被引用文献数: 1

PC向け連続音声認識プログラムJuliusのSuperHマイコン(SH-4A)への搭載に関して行った処理の高速化と,評価実験について報告する.計算リソースの限られたマイコン上で動作させるため,仮説探索時のメモリ管理の最適化や音響尤度計算の高速化を実施した.語彙数5,000語での評価実験では,最適化前のJuliusの実行速度に対して3.7倍の高速化を実現し,SH-4A上での実時間動作を達成した.また,語彙数20,000単語での評価でも実時間の1.25倍で動作すること確認した.最後に,応用アプリケーションとしてT-Engine上に実装した質問応答システムについて報告する.To expand CSR (continuous speech recognition) software to the mobile environmental use, we have developed embedded version of Julius (Embedded Julius). In this paper, we describe an implementation of the "Embedded Julius" on a SH-4A microprocessor. SH-4A is a high-end MPU with on-chip FPU. However, further computational reduction is necessary for the CSR software to operate real-time. Applying some optimizations (efficient memory management, modified GMS), the "Embedded Julius" achieves real-time processing on the SH-4A. The experimental results show 0.73 x real-time, resulting 3.7 times faster than baseline CSR. We also evaluated the "Embedded Julius" on a large vocabulary task (20,000 words). It shows almost real-time processing (1.25 x RT). Finally, We introduced Q & A guidance systems developed for embedded applications.

https://ci.nii.ac.jp/naid/110007970542

1 0 0 0 OA 音声情報案内システム「たけまるくん」および「キタちゃん」の開発

著者: 鹿野清宏 Cincarek Tobias 川波弘道西村竜一李晃伸
雑誌: 情報処理学会研究報告音声言語情報処理(SLP)
巻号頁・発行日: vol.2006, no.107(2006-SLP-063), pp.33-38, 2006-10-20

2015-08-19 17:30:19
1 + 0 Twitter

http://id.nii.ac.jp/1001/00056880/

1 0 0 0 音声情報案内システム「たけまるくん」および「キタちゃん」の開発

著者: 鹿野清宏 Cincarek Tobias 川波弘道西村竜一李晃伸
出版者: 一般社団法人情報処理学会
雑誌: 情報処理学会研究報告音声言語情報処理(SLP) (ISSN:09196072)
巻号頁・発行日: vol.2006, no.107, pp.33-38, 2006-10-20
被引用文献数: 9

筆者らは生駒市北コミュニティセンターに、音声情報案内システム「たけまるくん」を設置して、2002年11月から4年間運用している。最初の1年半あまり、精力的にシステムの改良を行い、現在では、子供を中心とする多くの市民に利用されている。このシステムは、大語彙連続音声認識プログラムJulius を用いた4万語あまりの大語彙の連続発声認識を中心に構築された本格的な自由発話による音声情報案内システムである。また、入力された音声や雑音はすべて収録され、とくに最初の2年間は書き起こしが終了している。この2年間の書き起こしデータを用いることによるシステムの性能の向上の予備評価についても報告する。このたけまるくんの成果を活かして、今年の3月末に、奈良先端大の近くの近鉄の駅「学研北生駒」に、独立した2つの音声情報案内システム「キタちゃん」と「キタロボ」を設置した。駅は60dBAと、コミュニティセンターに比べて、騒音レベルが10dB程度高く、厳しい音声認識の利用条件である。「キタちゃん」は、たけまるくんと同様にCGエージェントが応答する型で、タッチパネルも併用できる大人向けのシステムである。「キタロボ」は、ロボット型インタフェースで、どちらかというと子供向きのシステムである。この両システムの運用も6ヶ月間になるが、良好に動作している。たけまるくんからこの両システムへのポータビリィティについても述べる。We have been developing and operating "Takemaru-kun" spoken information guidance system in North Community Center in Ikoma city these four years. Takemaru-kun, which is composed of large vocabulary continuous speech recognition program Julius and Q-A database, is now widely used by Ikoma citizens, mainly children. All inputs have been recorded and the first two-year data are annotated. Takemaru-kun system improvement based on two year annotated data is also reported. Takemaru-kun was successfully ported to two spoken information guidance systems in Gakken North Ikoma railway station in the end of this March. These two systems are CG agent type "Kita-chan" , and robot type "Kita-robo". The portability of acoustic models from Takemaru-kun to Kita-robo is also discussed.

2015-07-29 17:30:12
1 + 0 Twitter

https://ci.nii.ac.jp/naid/110004839392

1 0 0 0 大語彙連続音声認識エンジン Julius ver. 4

著者: 李晃伸
出版者: 一般社団法人情報処理学会
雑誌: 情報処理学会研究報告音声言語情報処理(SLP) (ISSN:09196072)
巻号頁・発行日: vol.2007, no.129, pp.307-312, 2007-12-21
被引用文献数: 7 1

大語彙連続音声認識エンジン Julius は 2007年 12月にバージョン ver. 4 がリリースされた.7年ぶりのメジャーバージョン更新となる ver. 4 では,内部構造のモジュール化およびソースの全面的な再構成が行われ,可搬性と柔軟性が大幅に向上された.その結果,エンジン本体がライブラリ化された他のアプリケーションに組み込めるようになったほか,コールバック・プラグイン等の外部との連携の仕組みが整備され,機能の拡張や構成の変更が容易に行えるようになった.言語モデルも単語 N-gram および文法を単一バイナリで同等に扱えるようになり Julian は Julius に統一された.さらに,複数の言語モデルと音響モデルを任意に組み合わせて,1エンジンで並列認識を行うマルチデコーディングも可能となった.また,基本性能についても拡張と強化が行われた.言語モデルとして孤立単語認識が新たに追加されたほか,4-gram 以上の任意長 N-gram への対応、ユーザ関数による外部言語制約の組込み、GMM-based VAD およびデコーダベース VAD、confusion network の生成など大幅な機能強化が行われた。性能は従来バージョンと同等を維持しており、かつメモリ量の削減も行われている。The new version 4.0 of large vocabulary continuous speech recognition engine "Julius" has been released at December 2007, as a major version up from version 3.0. An anatomical analysis and data stcuture re-organization has been accomplished for the whole codes to improve its modularity and flexibility. Its improved structure now enables Julius to be compiled as a external library to be incorpolated into various user applications. A simple callback API and plugin facilities are newly built to be controlled directly and lively from outer applications, which enables easy but tight integration with other applications. Also, grammar-based recognizer Julian has been incorpolated into Julius and the N-gram and grammar can be treated at the same executable. Furthermodre, It supports fully multi-decoding using multiple LMs, AMs and their arbitral combinations. It now supports long N-gram (N unlimited), user-defined LM function, GMM-based and a newly proposed decoder-based VAD, confusion network generation, and many other new functions. The memory requirement has also been improved, while keeping the same accuracy.

2014-10-07 14:15:16
1 + 0 Twitter

https://ci.nii.ac.jp/naid/110006549592

1 0 0 0 連続音声認識コンソーシアム2000年度版ソフトウエアの概要と評価

著者: 河原達也住吉貴志李晃伸武田一哉三村正人伊藤彰則伊藤克亘鹿野清宏
出版者: 一般社団法人情報処理学会
雑誌: 情報処理学会研究報告音声言語情報処理(SLP) (ISSN:09196072)
巻号頁・発行日: vol.2001, no.100, pp.37-42, 2001-10-19
参考文献数: 20
被引用文献数: 24

連続音声認識コンソーシアム(CSRC)は、IPAプロジェクトで開発された「日本語ディクテーション基本ソフトウェア」の維持・発展をめざして、情報処理学会音声言語情報処理研究会のもとで活動を行っている。本稿では、2000年度(2000年10月-2001年9月)において開発されたソフトウエアの概要を述べる。今回、大語彙連続音声認識エンジン Julius の機能拡張、大規模なデータベースを用いた音響モデルの作成、種々の音響・言語モデル及びツール群の整備を行った。本ソフトウエアは現在、有償で頒布している。Continuous Speech Recognition Consortium (CSRC) was founded last year under IPSJ SIG-SLP for further enhancement of Japanese Dictation Toolkit that had been developed by the IPA project. An overview of the software developed in the first year (Oct. 2000 - Sep. 2001) is given in this report. We have revised the LVCSR (large vocabulary continuous speech recognition) engine Julius, and constructed new acoustic models using very large speech corpora. Moreover, a variety of acoustic and language models as well as toolkits are being set up. The software is currently available.

2013-09-12 13:00:08
1 + 1 Twitter

https://ci.nii.ac.jp/naid/110002917263

1 0 0 0 生駒市コミュニティセンター音声情報案内システムの開発と運用

著者: 西村竜一西原洋平鶴身玲典李晃伸猿渡洋鹿野清宏
出版者: 一般社団法人情報処理学会
雑誌: 情報処理学会研究報告音声言語情報処理(SLP) (ISSN:09196072)
巻号頁・発行日: vol.2003, no.14, pp.35-40, 2003-02-07
被引用文献数: 6

生駒市北コミュニティセンターの音声情報案内システム「たけまるくん」を開発した.本システムでは,大語彙連続音声認識を利用した一問一答形式の音声対話により,同センターや生駒市に関する案内を行うことが可能である.実用化を目指した本システムは,2002年11月6日からセンター内に常設され,開館時は誰でも自由に愛嬌のあるエージェントとのコミュニケーションを楽しむことができる.また,システムの改良に必要な対話記録を実際の運用を通じて収集し,発話内容の書き起こし等のデータの整備もすすめている.本稿では,主に本システムの構成および発話音声データ収集の状況について報告する.また,成人による比較的クリーンな発話をテストセットにした本システムの評価実験を行い,84%の単語正解率と70%の応答正解率を確認した.We implemented a practical speech guidance system for public use. It is called ``Takemaru-kun'', and located daily at the entrance hall of Ikoma Community Center to inform visitors about the center and around Ikoma city via speech human-machine interface and funny animating agent of Takemaru. This system aims to promote a field test for robust speech recognition in practical environment, and to collect actual utterance data in the framework of human-machine speech dialogue. The system has been running everyday since November 6, and a large number of user utterances have been collected. Classification and transcription of the data is also undertaken. This paper reports the outline of this system and current status of the data collection. In a recognition experiment with extracted samples of adult voices, word accuracy of 84% and answer rate of 70% was obtained.

2013-01-18 18:45:04
1 + 0 Twitter

https://ci.nii.ac.jp/naid/110002913776

1 0 0 0 公共音声情報案内システム「たけまるくん」の運用および収集発話の分析

著者: 李晃伸山田真士西村竜一鹿野清宏
出版者: 一般社団法人情報処理学会
雑誌: 情報処理学会研究報告音声言語情報処理(SLP) (ISSN:09196072)
巻号頁・発行日: vol.2004, no.103, pp.49-54, 2004-10-22
被引用文献数: 8

機械に対するユーザの自然な実発話の収集と統計的な分析のために,我々は音声情報案内エージェントシステム「たけまるくん」を公共施設に設置し,2004年5月までの19ヶ月間で約17万発話を収集・整備した.本稿では現在のシステム構成,収集データの分析結果および雑音・不要音棄却実験の結果を報告する.全体のおよそ 30%が雑音などの非音声入力であった.音声入力のうち81%が有効発話であり,残りは背景会話・無意味な発声・不明瞭で聞き取れない発声・発話断片・オーバフローなどの応答不能な無効発話であった.これらの無効発話に対して,入力長とGMMに基づく雑音・不要音棄却の性能を評価した.1か月分8 248個のデータで実験した結果,雑音・息・咳・笑い声などの非音声入力は99%棄却でき,叫び声や遠隔で発声された背景会話もある程度棄却できることが分かった.一方で,発話断片やドメイン外発話については音響的特徴からの弁別は難しかった.In order to collect user's actual utterances to a speech dialogue system on real situation, we have located a speech-oriented information guidance system called ``Tekemaru-kun'' at a public civil hall, and collected 177,789 inputs via 19 months' operation. This paper will report the current system architecture, details of collected data and experimental results of invalid input rejection. As a result, non-voice (noise) inputs occupies about 30% of total input, and 81% of voice inputs are valid inputs. The rests are invalid voice inputs that includes background speech, incomprehensible voice, obscure speech, fragmented speech, level overflow and so on. Rejection of those invalid inputs has been examined based on input length threshold and GMM-based identification. Experiments on 8,248 inputs of one month showed that almost all of noise and non-verbal inputs such as breath, coughing and laughter can be rejected successfully, and distant background speech and shouts were also discriminative, whereas out-of-domain utterance, obscure speech and fragments cannot be detected only by the acoustic property.

2010-03-21 22:00:11
1 + 1 Twitter

https://ci.nii.ac.jp/naid/110002950571