強化学習法のための状態グルーピングとオポチュニチ評価に関する研究

doi:10.1541/ieejeiss1987.117.9_1300

1 0 0 0 OA 強化学習法のための状態グルーピングとオポチュニチ評価に関する研究

著者: 兪文偉横井浩史嘉数侑昇
出版者: 一般社団法人電気学会
雑誌: 電気学会論文誌C(電子・情報・システム部門誌) (ISSN:03854221)
巻号頁・発行日: vol.117, no.9, pp.1300-1307, 1997-08-20 (Released:2008-12-19)
参考文献数: 14

In this paper, we propose the State Grouping scheme for coping with the problem of scaling up the Reinforcement Learning Algorithm to real, large size application. The grouping scheme is based on geographical and trial-error information, and is made up with state generating, state combining, state splitting, state forgetting procedures, with corresponding action selecting module and learning module. Also, we discuss the Labeling Based Evaluation scheme which can evaluate the opportunity of the state-action pair, therefore, use better experience to guide the exploration of the state-space effectively. Incorporating the Labeling Based Evaluation and State Grouping scheme into the Reinforcement Learning Algorithm, we get the approach that can generate organized state space for Reinforcement Learning, and do problem solving as well. We argue that the approach with this kind of ability is necessary for autonomous agent, namely, autonomous agent can not act depending on any pre-defined map, instead, it should search the environment as well as find the optimal problem solution autonomously and simultaneously. By solving the large state-size 3-DOF and 4-link manipulator problem, we show the efficiency of the proposed approach, i.e., the agent can achieve the optimal or sub-optimal path with less memory and less time.

1 0 0 0 OA 強化学習法のための状態グルーピングとオポチュニチ評価に関する研究

言及状況

外部データベース (DOI)

Twitter (1 users, 1 posts, 0 favorites)

収集済み URL リスト