著者
原 聡
出版者
一般社団法人 人工知能学会
雑誌
人工知能 (ISSN:21882266)
巻号頁・発行日
vol.33, no.3, pp.366-369, 2018-05-01 (Released:2020-09-29)
被引用文献数
2
著者
荒井 幸代 宮崎 和光 小林 重信
出版者
一般社団法人 人工知能学会
雑誌
人工知能 (ISSN:21882266)
巻号頁・発行日
vol.13, no.4, pp.609-618, 1998-07-01 (Released:2020-09-29)
被引用文献数
1

Most of multi-agent systems have been developed in the field of Distributed Artificial Intelligence (DAI) whose schemes are based on plenty of pre-knowledge of the agents' world or organized relationships among the agents. However, these kind of knowledge would not be always available. On the other hand, multi-agent reinforcement learning is worth considering to realize the cooperative behavior among the agents with littls pre-knowledge. There are two main problems to be considered in multi-agent reinforcement learning. One is the uncertainty of state transition problem which is owing to concurrent learning of the agents. And the other is the perceptual aliasing problem which is generally held in such a world. Therefore, the robustness and flexibility are essential for the multi-agent reinforcement learning toward these two problems. In this paper, we evaluate Q-learning and Profit Sharing as the method for multi-agent reinforcement learning through some experiments. We take up the Pursuit Problem as one of the multi-agent world. In the experiments, we do not assume the existence of any pre-defined relationship among agents or any control knowledge for cooperation. Learning agents do not share sensation, episodes and policies. Each agent learns through its own episodes independent of the others. The result of experiments shows that cooperative behaviors emerge clearly among Profit Sharing hunters who are not influenced by concurrent learning even when the prey has the certain escaping way against the hunters. Moreover, they behave rational under the perceptual aliasing areas. On the other hand, Q-learning hunters can not make any policy in such a world. Through these experiments, we conclude that Profit Sharing has the good properties for multi-agent reinforcement learning because or its rubustness for the change of other agents' policies and the limitation of agent's sensing abilities.