著者
木村 優志 澤田 心大 入部 百合絵 桂田 浩一 新田 恒雄
出版者
一般社団法人 電気学会
雑誌
電気学会論文誌C(電子・情報・システム部門誌) (ISSN:03854221)
巻号頁・発行日
vol.132, no.9, pp.1473-1480, 2012-09-01 (Released:2012-09-01)
参考文献数
21

In this paper, we propose a task estimation method based on multiple subspaces extracted from multi-modal information of image objects in visual scenes and spoken words in dialog appeared in the same task. The multiple subspaces are obtained by using latent semantic analysis (LSA). In the proposed method, a task vector composed of spoken words and the frequencies of image-object appearances are extracted first, and then similarities among the input task vector and reference sub-spaces of different tasks are compared. Experiments are conducted on the identification of game tasks. Experimental results show that the proposed method with multi-modal information outperforms the method in which only single modality of image or spoken dialog is applied. Moreover, the proposed method achieved accurate performance even if less spoken dialog is applied.