著者
杉山 貴昭 船越 孝太郎 中野 幹生 駒谷 和範
出版者
一般社団法人 人工知能学会
雑誌
人工知能学会論文誌 (ISSN:13460714)
巻号頁・発行日
vol.31, no.3, pp.C-FB2_1-9, 2016-05-01 (Released:2016-05-25)
参考文献数
18
被引用文献数
1

When a robot interacts with users in public spaces, it receives various sounds such as surrounding noises and users' voices. And furthermore, the robot needs to interact with multiple people at the same time. If the robot incorrectly determines whether it should respond to these sounds, it will erroneously respond to surrounding noises or ignore user utterances directed to the robot. In this paper, we present a machine learning-based method to estimate a response obligation, i.e., whether the robot should respond to an input sound. We address a problem setting that is more similar to interactions in public spaces than those assumed in previous studies. While previous studies assume only utterances directed to one of interlocutors as input sounds, we deal with not only those utterances but also noises and monologues. To deal with various sounds, our method uses the results of input sound classification and user behaviors both in an input sound interval and after the interval. In particular, the user behaviors after the interval are introduced as a key factor for improving the estimation accuracy of response obligation, such as a tendency that a user stands and keeps still after he/she talks to the robot. We demonstrate the new features significantly improved the estimation performance. We also investigate performances with various combinations of features and reveal that the results of input sound classification and the user behaviors after the interval are helpful for the estimation.
著者
杉山 貴昭 駒谷 和範 佐藤 理史
出版者
一般社団法人 人工知能学会
雑誌
人工知能学会論文誌 (ISSN:13460714)
巻号頁・発行日
vol.29, no.1, pp.32-40, 2014-01-05 (Released:2014-01-07)
参考文献数
16
被引用文献数
2 1

We have tackled a novel problem of predicting when a user is likely to begin speaking to a humanoid robot. The generality of the prediction model should be examined to apply it to various users. We show in this paper that the following two empirical evaluations. First, our proposed model does not depend on the specific participants whose data were used in our previous experiment. Second, the model can handle variations caused by individuality and instruction. We collect a data set to which 25 human participants give labels, indicating whether or not they would be likely to begin speaking to the robot. We then train a new model with the collected data and verify its performance by cross validation and open tests. We also investigate relationship of how much each human participant felt possible to begin speaking with a model parameter and instruction given to them. This shows a possibility of our model to handle such variations.