著者
岸波 華彦 糸山 克寿 西田 健次 中臺 一博
出版者
一般社団法人 日本ロボット学会
雑誌
日本ロボット学会誌 (ISSN:02891824)
巻号頁・発行日
vol.39, no.3, pp.271-274, 2021 (Released:2021-04-28)
参考文献数
11

In recent years, many kinds of sensors have been studied to recognize the environment, and they are used for AR and VR applications and for SLAM. Although ultrasonic signals with high directivity and high resolution are often used, there are problems such as ultrasonic exposure and grating noise at the rising edge. In this paper, we propose a new active sensing method based on audible sounds that is robust to environmental noise by combining weighting likelihood functions and standing waves. Compared to ultrasonic signals, audible sound tends to spread out, which leads to misalignment of distance estimates and loss of map consistency over time. Therefore, we derive the effective azimuth angle based on the directional characteristics of the speaker and calculate the likelihood of the presence or absence of obstacles using the observation model. In addition, we introduce occupancy grid mapping to produce a map that best explains the estimated distances. We performed real-world two-dimensional environment recognition experiments using the proposed method to detect and map surrounding obstacles, and showed the effectiveness of the method.
著者
岸波 華彦 糸山 克寿 西田 健次 中臺 一博
出版者
一般社団法人 日本ロボット学会
雑誌
日本ロボット学会誌 (ISSN:02891824)
巻号頁・発行日
vol.40, no.4, pp.351-354, 2022 (Released:2022-05-20)
参考文献数
12

This paper addresses reconstruction of visual scenes based on echolocation, aiming to develop auditory scene understanding for robots and systems. Although scene understanding technology with a camera and a LIDAR has been studied well, it is prone to changes in lighting conditions and has difficulty in detecting invisible materials. Ultrasonic sensors are widely used, but their use is limited to distance estimation. There is an unavoidable risk of ultrasonic exposure since most ultrasonic power exists in inaudible frequency ranges. To solve these problems, we propose a framework for echolocation-based scene reconstruction (ELSR). ELSR can reconstruct a visual scene using the transmitted/received audible sound, and it exploits a Generative Adversarial Network (GAN) to learn translation from input sound to a visual scene. As GAN is originally designed for image input, we carefully considered the difference between image and sound input and propose introducing cross-correlation and trigonometric function-based features to input audio features. The proposed framework is implemented based on pix2pix, a kind of conditional GAN, and a dataset for ELSR consisting of 10,800 pairs of input sound and depth images recorded at 28 indoor locations was newly created. Experimental results using the dataset showed the effectiveness of the proposed framework ELSR and audio features.