- 著者
-
釜谷 博行
阿部 健一
- 出版者
- 一般社団法人 電気学会
- 雑誌
- 電気学会論文誌C(電子・情報・システム部門誌) (ISSN:03854221)
- 巻号頁・発行日
- vol.122, no.7, pp.1186-1193, 2002-07-01 (Released:2008-12-19)
- 参考文献数
- 22
The most widely used reinforcement learning (RL) algorithms are limited to Markovian environments. To handle larger scale partially observable Markov decision processes, we propose a new on-line hierarchical RL algorithm, which is called Switching Q-learning (SQ-learning). The basic idea of SQ-learning is that non-Markovian tasks can be automatically decomposed into subtasks solvable by multiple policies, without any other information leading to good subgoals. To deal with such decomposition, SQ-learning employs ordered sequences of Q modules in which each module discovers a local control policy based on Sarsa (λ). Furthermore, a hierarchical structure learning automaton is used which finds appropriate subgoal sequences according to LR-I algorithm. The results of extensive simulations demonstrate the effectiveness of SQ-learning.