著者
宮崎 和光 山村 雅幸 小林 重信
出版者
一般社団法人 人工知能学会
雑誌
人工知能 (ISSN:21882266)
巻号頁・発行日
vol.12, no.1, pp.78-89, 1997-01-01 (Released:2020-09-29)

Reinforcement learning is a kind of machine learning. It aims to adapt an agent to a given environment with a clue to rewards. Profit sharing (PS) can get rewards efficiently at an initial learning phase. However, it can not always learn an optimum policy that maximizes rewards per an action. Though Q-learning is guaranteed to obtain an optimum policy, it needs numerous trials to learn it. On Markov decision processes (MDPs), if a correct environment model is identified, we can derive an optimum policy by applying Policy Iteration Algorithm (PIA). As an efficient method for identifying MDPs, k-Certainty Exploration Method has been proposed. We consider that ideal reinforcement learning systems are to get some rewards even at an initial learning phase and to get mere rewards as the identification of environments proceeds. In this paper, we propose a unified learning system : MarcoPolo which considers both getting rewards by PS or PIA and identifying the environment by k-Certainty Exploration Method. MarcoPolo can realize any tradeoff between exploitation and exploration through the whole learning process. By applying MarcoPolo to an example, its basic performance is shown. Moreover, by applying it to Sutton's maze problem and its modified version, its feasibility on more realistic domains is shown.
著者
山口 智浩 野村 勇治 田中 康祐 谷内田 正彦
出版者
一般社団法人 人工知能学会
雑誌
人工知能 (ISSN:21882266)
巻号頁・発行日
vol.12, no.6, pp.870-880, 1997-11-01 (Released:2020-09-29)

The advantage of emergence is that various solutions are emerged. However, it takes large computation cost to emerge them since it requires the numbers of iterations of simulation. So we try to reduces the computation cost without losing variety of solutions by introducing the abstraction technique in Artificial Intelligence. This paper presents Isomorphism Based Reinforcement Learning by Isomorphism of Actions that reduces the learning cost without losing variety of solutions. Isomorphism is one of the concepts in Enumerative Combinatorics of mathematics. First we explain Isomorphism of Actions, then explain Isomorphism of Behaviors. Isomorphic behaviors those perform the same task can be obtained by transforming the learning result of the task by "the appropriate permutation". However, a priori knowledge that represents "the appropriate permutation" is not always given, so this paper uses the generate & test method that first generates the isomorphic learning results by transforming the learning result of reinforcement learning for a task by the combinatorial permutations, then tests to select two kinds of the behaviors performing the following tasks ; (1) isomorphic behaviors those perform the same task ; (2) discovery of the behaviors those are converged to the new task state. Since the acquired learning results are isomorphic each other, the merits of our method are those the time cost for generating various learning results is small and also the space cost is small too because it needs only the original learning result and the set of permutations for it. For these reasons, this method is significant for realizing the learning various behaviors for the dynamic environment or multiagent.