韻律情報を用いた相槌の挿入

2 0 0 0 韻律情報を用いた相槌の挿入

著者: 岡登洋平加藤佳司山本幹雄板橋秀一
出版者: 一般社団法人情報処理学会
雑誌: 情報処理学会論文誌 (ISSN:18827764)
巻号頁・発行日: vol.40, no.2, pp.469-478, 1999-02-15
被引用文献数: 13

機械とユーザの対話において機械が人間と同様に相槌を打つことが可能であればユーザの話しやすさの改善につながる. 本研究では話し手の発話間にポーズの出現とほぼ同時に相槌が打たれる場合を対象としてシステムによる相槌挿入を試みた. システムが適切なタイミングで相槌を打つにはポーズを検出するよりも早く相槌の挿入判定を行う必要がある. そこで本稿では話し手の発話から抽出した韻律情報を用いて予測的に相槌の挿入判定を行う手法について検討した. まず対象としたテレフォンショッピングをタスクとした対話について音声を分析し聞き手の相槌が韻律的に特徴のある話し手の発話箇所で打たれていることを示した. 次に相槌音声を消去した対話を聞かせ相槌の箇所を人間が判定する実験を行ったところ実際に出現した相槌の76%は実験でも検出され発話長が長い場合に相槌を打つと判定した被験者が多いことが明らかになった. さらに相槌を打つタイミングについて対話の分析と知覚実験を行った. この結果相槌は発話中のポーズ開始から0.3秒以内に打つ必要があることが明らかになった. そこでテンプレートを用いた韻律パターンの認識による相槌タイミングの検出方法を提案し相槌判定のための予測時間を変えて相槌挿入判定とタイミングの検出実験を行ったところ予測時間0.1秒のとき84% 予測時間0.4秒のとき72%のタイミング正解率を得た. また予測時間0.1秒のとき得られたシステムの応答を人間が評価したところ抽出箇所の74%は自然な発声箇所であると判定された.A user's degree of comfort in a man-machine spoken dialog environment is likely to improve, if spoken dialog systems can provide correct 'Aizuchi' responses to the use's utterances. This hypothesis was evaluated using a dialog corpus that relates to telephone shopping tasks, and contains 'Aizuchi' responses near the end of a speaker's utterance. The evaluation also requires a dialog system capable of detecting 'Aizuchi' timing before the end of the utterance. To this end, therefore, a method is proposed which uses prosodic information to guide correct 'Aizuchi' responses. A preliminary prosodic analysis of our utterances confirmed that an 'Aizuchi' indeed relates to the duration, speaking rate and minimum F0 of an utterance. Next, using dialogs from which 'Aizuchi' responses were previously removed, an experiment was carried out to spontaneously prompt such responses from human subjects. Results show that subjects were able to match about 80% of the 'Aizuchi' responses contained in the original dialogs, and that many subjects tended to do so during long utterances. Then, a dialog analysis was performed to investigate 'Aizuchi' timing, Results of which indicate that the system should give an 'Aizuchi' within 0.3 seconds of the end of the speaker's utterance. By comparison, in an 'Aizuchi'-prompting experiment based on prosodeic pattern recognition, the system achieved 84% with no 0.1-second prediction of end of utterance and 72% with 0.4-second prediction. Finally, human perceptual evaluation of the timing of system detection, yielded an accuracy of 74% which lends support to the naturalness of 'Aizuchi' response given by system.

https://ci.nii.ac.jp/naid/110002764800

言及状況

はてなブックマーク (1 users, 2 posts)

Twitter (1 users, 1 posts, 0 favorites)

こんな論文どうですか？韻律情報を用いた相槌の挿入 (<特集>ヒューマンインタフェースとインタラクション)(岡登洋平ほか),1999 http://t.co/5aVjjG2ZNk

収集済み URL リスト

https://ci.nii.ac.jp/naid/110002764800 (2)