山田 和明 大倉 和博
一般社団法人 日本機械学会
日本機械学会論文集C編 (ISSN:18848354)
vol.78, no.792, pp.2950-2961, 2012 (Released:2012-08-25)

Reinforcement learning approaches attract attention as the technique to construct the mapping function between sensors-motors of an autonomous robot through trial-and-error. Traditional reinforcement learning approaches make use of look-up table to express the mapping function between the grid state space and the grid action space. However the grid size of the state space affects the learning performances significantly. To overcome this problem, many researchers have proposed algorithms using neural networks to express the mapping function between the continuous state space and actions. However, in this case, a designer needs to appropriately set the number of middle neurons and the initial value of weight parameters of neural networks to improve the approximate accuracy of neural networks. This paper proposes a new method to automatically set the number of middle neurons and the initial value of the weight parameters of neural networks, on the basis of the dimensional-number of the sensor space, in Q-learning using neural networks. The proposed method is demonstrated through a navigation problem of an autonomous mobile robot, and is evaluated by comparing Q-learning using RBF networks and Q-learning using neural networks whose parameters are set by a designer.
酒井邦嘉作 山田和明絵
山田 和明 高野 慧
The Society of Instrument and Control Engineers
計測自動制御学会論文集 (ISSN:04534654)
vol.49, no.1, pp.39-47, 2013 (Released:2013-02-08)
1 1

This paper proposes a new reinforcement learning approach for acquiring conflict avoidance behaviors in multi-agent systems. Multi-agent systems are able to establish orderly systems autonomously through interaction with autonomous agents. We expect to be able to construct flexible and robust systems for the environmental changes by using multi-agent system approaches. However, it is difficult for designers to preliminarily embed appropriate behaviors to avoid conflict because complex dynamics emerges by interaction between many agents. We apply the proposed method to the narrow road problem that many agents go by each other in a narrow road, and verify the effectivity of the proposed method. In the narrow road problem, it is the optimal strategy that an agent selects going forward and another agent selects giving way. However, it is difficult for agents to decide which strategy to select because they cannot predict other agents' behaviors beforehand. The proposed method can differentiate into agents preferring to go forward and agents preferring to give way, by using Q-learning that can adjust discount rates. We solve conflict problems in multi-agent systems through autonomous functional differentiation of many learning agents. Through experimental results, we showed that agents differentiated into two type of agents, and acquired stable conflict avoidance behaviors with high probability than a conventional Q-learning.
山田 和明 保田 俊行 大倉 和博
一般社団法人 日本機械学会
日本機械学会論文集 (ISSN:21879761)
vol.84, no.862, pp.17-00288, 2018 (Released:2018-06-25)

The field of multi-robot systems (MRSs), which deals with groups of autonomous robots, is recently attracting much research interest from robotics. MRSs are expected to achieve their tasks that are difficult to be accomplished by an individual robot. In MRSs, reinforcement learning (RL) is one of promising approaches for distributed control of each robot. RL allows participating robots to learn mapping from their states to their actions by rewards or payoffs obtained through interacting with their environment. Theoretically, the environment of MRSs is non-stationary, and therefore rewards or payoffs learning robots receive depend not only on their own actions but also on the action of other robots. From this point of view, an RL method which segments state and action spaces simultaneously and autonomously to extend the adaptability to dynamic environment, named Bayesian-discrimination-function-based Reinforcement Learning (BRL) has been proposed. In order to improve the learning performance of BRL, this paper proposes a technique of selecting either of two state spaces: one is parametric model useful for exploration and the other is non-parametric model for exploitation. The proposed technique is evaluated through computer simulations of a cooperative carrying task with six autonomous mobile robots.
山田 和明 大倉 和博 上田 完次
公益社団法人 計測自動制御学会
計測自動制御学会論文集 (ISSN:04534654)
vol.39, no.3, pp.266-275, 2003-03-31 (Released:2009-03-27)

We describe a distributed approach to controlling autonomous arm robots. The robots need to acquire cooperative behaviors in order to smoothly lift an object. Each arm robot has its own reinforcement learning unit for decision-making. In investigating this task, we are primarily interested in the question of how to design a reinforcement learning control system for a multi-agent system. An applied reinforcement learning algorithm uses Bayesian discrimination method to segment continuous state and action spaces simultaneously, thereby generating of a set of effective rules. The proposed approach is examined empirically with two real arm robots. The basic dynamics of the reinforcement learning process are also analyzed.
豊川 好司 山田 和明 高安 一郎 坪松 戒三
Japanese Society of Animal Science
日本畜産学会報 (ISSN:1346907X)
vol.49, no.8, pp.572-577, 1978

しょ糖10%液浸漬稲わら,またはハスクマシンにより磨砕した稲わらを,それぞれメン羊に自由摂食させ,稲わら乾物摂取量を高めた場合の第一胃内滞留時間と消化管内充満度を測定して,稲わらの低い摂取量の原因が何であるかを明らかにしようとした.供試メン羊は前報1)の2頭に1頭をかえた3頭であるが,基礎飼料も前報1)と同一のふすまを同量与え,飼養条件も同じであった.試験区は,原料稲わらを5cm前後に細切した対照区に対し,処理区は同稲わらをしょ糖10%液浸漬したしょ糖区,および同稲わらを機械的に圧縮磨砕した磨砕区である.その結果,(1) 稲わら乾物摂取量は対照区まりもしょ糖区28%,磨砕区18%高かった.またW0.75当たり全乾物摂取量は,しょ糖区が16%,磨砕区が11%多かった.(2) ふすまを含めた全飼料の消化率は,磨砕区では粗繊維が他2区より,セルロースが対照区よりも有意(P<0.05)に高く,しょ糖はNFEが他2区より,乾物と可消化エネルギーが対照区よりも有意(P<0.05)に高かった.(3) 全飼料の消化量の差は,繊維成分の有意差は認められなかったが,しょ糖区のNFE,可消化エネルギーおよび乾物は前記同様有意差を示した.(4) 稲わらの反芻胃内滞留時間は,対照区と磨砕区とがほとんど同じ程度であったが,しょ糖区がやや長かった.(5) 全消化管内充満度は,対照区よりもしょ糖区16%,磨砕区10%高かったが有意差はなかつた.(6) 以上のように対照区の第一胃内滞留時間が磨砕区と大差ないのに,全消化管内充満度が最も低く差があることは,対照区稲わらの摂食抑制が,稲わらの反芻胃内滞留に起因しないことを示した.
白井 良成 中小路 久美代 山田 和明
情報処理学会研究報告ヒューマンコンピュータインタラクション(HCI) (ISSN:09196072)
vol.2004, no.14, pp.17-24, 2004-02-06

本研究は,空間的,時間的なユビキタスコンピューティング社会において,多種多様,大量のオブジェクトが個々のインタラクションヒストリを累積した際に,それをどのようにユーザが利用すべきか,それに適した表現系と操作系とはどうあるべきかを考察し,そのためのインタラクションデザインの枠組みを構築しようとするものである.本論では,インタラクションヒストリの要約と閲覧というユーザの行為に着目し,インタラクションヒストリ閲覧目的のtaxonomyの構築をおこなう.そして,それぞれの側面に適した時間軸を利用する要約手法とそのための表現手法について考察する.最後に,インタラクションヒストリの可視化手法の一例として,我々が構築してきているOptical Stainシステムについて論じる.Optical Stainの利用経験の分析と発展させるべきシステムの側面を省察することにより,本研究の今後の課題と方向性について論じる.The goal of our research is to develop an interaction design framework for the use of interaction histories of objects. Ubiquitous computing enables each of a variety of objects (including humans) to keep track of its/his/her interactions with other objects over a very long period of time. Little research has been done in support of how to design representations and operations to use such interaction histories of a large number of heterogeneous objects. This paper focuses on summarization and browsing techniques for interaction histories. We present a taxonomy for interaction history browsing purposes, and discuss what temporal representations would be appropriate for different types of such browsing purposes. We introduce Optical Stain, a system that keeps track of posters on a physical bulletin board and gives visual feedback on the board as trajectories of past posters, to illustrate our point of view.