著者
岡留 有哉 阿多 健史郎 石黒 浩 中村 泰
出版者
一般社団法人 人工知能学会
雑誌
人工知能学会論文誌 (ISSN:13460714)
巻号頁・発行日
vol.37, no.6, pp.B-M43_1-13, 2022-11-01 (Released:2022-11-01)
参考文献数
30

Developing a communication agent that can mutually interact with a human has been expected. To realize the agent, real-time situation recognition and motion generation are necessary. The human-human interaction data is utilized to develop the recognition and the generation model. However, a cost of giving a certain label to the data is expensive, i.e., the number of labeled data becomes small. To cope with the small dataset problem, one of the approaches is to obtain the pre-trained weight by self-supervised learning. In this research, we propose estimating the amount of time-shift by “lag operation” as a task for self-supervised learning. The observed data is not isolated during the interaction between two people, and using both observed information from two people makes an estimation model reduce the uncertainty of situation detection. By exploiting these properties of interaction data, the time index of data of one person is shifted, i.e., the entrainment of two data is broken. This operation is called a “lag operation”, and estimating the amount of time-shift is defined as the pre-training task. We apply this pre-training to the prediction experiment that estimates near-future laughing during a conversation. The result shows the accuracy of the laughing prediction is improved by 1.3 points, and the lag operation is an effect for predicting the change of interaction situation.