著者
中村 覚 佐久間 淳 小林 重信 小野 功
出版者
一般社団法人 人工知能学会
雑誌
人工知能学会全国大会論文集 第22回全国大会(2008)
巻号頁・発行日
pp.83, 2008 (Released:2009-07-31)

本論文では,クォートドリブン市場におけるマーケットメーカー(MM)の戦略獲得問題を,MMの利益および約定率の観点から多目的最適化問題として定式化し,多目的遺伝的アルゴリズムにより戦略の最適化を行う手法を提案する.
著者
佐藤 浩 小野 功 小林 重信
出版者
社団法人人工知能学会
雑誌
人工知能学会誌 (ISSN:09128085)
巻号頁・発行日
vol.12, no.5, pp.734-744, 1997-09-01
被引用文献数
211

When Genetic Algorithms (GAs) are applied to optimization problems, characteristic preserving in designing coding/crossover and diversity maintaining in designing generation alternation are important. Generation alternation models are independent of problems, while coding/crossover depends on problems. We discuss generation alternation models in this paper. Simple GA is one of the well-known generation alternation models, however it has two problems. One is early convergence in the first stage of search and the other is evolutionary stagnation in the last stage of it. Many improvements and new models have been presented to overcome the above problems. In this paper, we propose a new generation alternation model called minimal generation gap (MGG) which has all advantages of conventional models. As generation alternation models use only information of fitness, alternation of generations can be regarded as a transformation of fitness distributions. We propose a new method of assessing generation alternation models. We measure the ability of avoiding the early convergence and suppressing the evolutionary stagnation by the dynamics of the best value and variance of fitness distributions. From the results of some experiments, we found that MGG is the most desirable model which can avoid the early convergence and suppress the evolutionary stagnation. We also show the efficiency of MGG by applying it to benchmarks in different two domains: function optimization and traveling salesman problems. In the both domains, MGG showed higher performance than the other conventional models especially under small population size.
著者
永田 裕一 小林 重信
出版者
社団法人人工知能学会
雑誌
人工知能学会誌 (ISSN:09128085)
巻号頁・発行日
vol.14, no.5, pp.848-859, 1999-09-01
被引用文献数
21

In this paper, we propose a new crossover named Edge Assembly Crossover(EAX) for the Traveling Salesman Problern. EAX constructs intermediate individuals by assembling only edges of parents under the assignment relaxation that is one of TSP relaxations, and then modifies them into complete ones to satisfy the original constraint of TSP. EAX has two features. One is that EAX is excellent in chartacteristic preservation in view of preserving edges from parents to children. The other is that EAX can generate a wide variety of children from a pair of parents. We show by experiments that EAX generates new soltution candidates under a suitable trade-off between characteristic preservation and a variety of children, which brings better performance than other crossovers do. Finally, we show that EAX realizes global neighborhood search by comparing EAX with the traditional local search operators for TSP.
著者
佐藤 浩 小野 功 小林 重信
出版者
一般社団法人 人工知能学会
雑誌
人工知能 (ISSN:21882266)
巻号頁・発行日
vol.12, no.5, pp.734-744, 1997-09-01 (Released:2020-09-29)
被引用文献数
5

When Genetic Algorithms (GAs) are applied to optimization problems, characteristic preserving in designing coding/crossover and diversity maintaining in designing generation alternation are important. Generation alternation models are independent of problems, while coding/crossover depends on problems. We discuss generation alternation models in this paper. Simple GA is one of the well-known generation alternation models, however it has two problems. One is early convergence in the first stage of search and the other is evolutionary stagnation in the last stage of it. Many improvements and new models have been presented to overcome the above problems. In this paper, we propose a new generation alternation model called minimal generation gap (MGG) which has all advantages of conventional models. As generation alternation models use only information of fitness, alternation of generations can be regarded as a transformation of fitness distributions. We propose a new method of assessing generation alternation models. We measure the ability of avoiding the early convergence and suppressing the evolutionary stagnation by the dynamics of the best value and variance of fitness distributions. From the results of some experiments, we found that MGG is the most desirable model which can avoid the early convergence and suppress the evolutionary stagnation. We also show the efficiency of MGG by applying it to benchmarks in different two domains: function optimization and traveling salesman problems. In the both domains, MGG showed higher performance than the other conventional models especially under small population size.
著者
山村 雅幸 小野 貴久 小林 重信 Masayuki Yamamura Takahisa Ono Shigenobu Kobayashi
雑誌
人工知能学会誌 = Journal of Japanese Society for Artificial Intelligence (ISSN:09128085)
巻号頁・発行日
vol.7, no.6, pp.1049-1059, 1992-11-01

Genetic Algorithms (GA) is a new learning paradigm that models a natural evolution mechanism. The framework of GA straightly corresponds to an optimization problem. They are classified into functional optimization and combinatorial one, and have been studied in different manners. GA can be applied to both types of problems and moreover their combinations. According to generations, GA will discover and accumulate building blocks in the form of schemata, and find the global solution as their combinations. It is said GA can find the global solution rapidly if the population holds sufficient varieties. However, this expectation has not been confirmed rigidly. Indeed, there are some problems pointed out such as the early convergence problem in functional optimization, and the encode/decode-crossover problem in combinatorial one. In this paper, we give a solution to the encode/decode-crossover problem for traveling salesman problems (TSP) with a character-preserving GA. In section 2, we define the encode/decode-crossover problem. The encode-decode problem is to define a correspondence between GA space and problem space. The crossover problem is to define a crossover method in GA space. They are closely related to the performance of GA. We point out some problems with conventional approaches for TSP. We propose three criteria to define better encode/decode ; the completeness, soundness and non-redundancy. We also propose a criterion to define better crossover ; character-preservingness. In section 3, we propose a character-preserving GA. In TSP, good subtours are worth preserving for descendants. We propose a subtour exchange crossover, that will not break subtours as possible. We also propose a compress method to improve efficiency. In section 4, we design an experiment to confirm usefulness of our character-preserving GA. We use a double-circled TSP in which the same numbers of cities are placed on two concentrated circles. There are two kinds of local solutions ; "C"-type and "O"-type. The ratio between outer and inner radius determines which is the optimum solution. We vary the radius ratio and see how much optimal solutions are obtained. In the result, character-preserving GA finds optimal solutions effectively.
著者
神谷 昭基 小野 功 小林 重信
出版者
一般社団法人 電気学会
雑誌
電気学会論文誌C(電子・情報・システム部門誌) (ISSN:03854221)
巻号頁・発行日
vol.117, no.7, pp.829-836, 1997-06-20 (Released:2008-12-19)
参考文献数
13

Start-up scheduling is aimed at minimizing the start-up time while limiting turbine rotor stresses to an acceptable level. This scheduling problem has a wide search space. In order to improve the search efficiency and robustness and to establish an adaptive search model, we propose to integrate evolutionary computation, based on Genetic Algorithms (GA), with reinforcement learning. The strategies with our proposal include: multi-boundary-based enforcement operator and multi-elitist plan. By setting a second boundary, located right outside the existing boundary containing those feasible schedules, we extend our proposed enforcement operator and the conventional elitist plan into the multi-boundary-based enforcement operator and multi-elitist plan. These two strategies work together to focus the search along the boundary, around which the optimal schedule is supposed to exist, so as to increase the search efficiency as well as its robustness. During a search process, GA guides the reinforcement learning to concentrate its learning on those promising areas instead of the entire space. In return, reinforcement learning can generate a good schedule, in the earlier stage of the search process. We obtain encouraging test results. In this paper, we propose the GA-based search model with these strategies and discuss the test results.
著者
山村 雅幸 小野 貴久 小林 重信
出版者
社団法人人工知能学会
雑誌
人工知能学会誌 (ISSN:09128085)
巻号頁・発行日
vol.7, no.6, pp.1049-1059, 1992-11-01
被引用文献数
105

Genetic Algorithms (GA) is a new learning paradigm that models a natural evolution mechanism. The framework of GA straightly corresponds to an optimization problem. They are classified into functional optimization and combinatorial one, and have been studied in different manners. GA can be applied to both types of problems and moreover their combinations. According to generations, GA will discover and accumulate building blocks in the form of schemata, and find the global solution as their combinations. It is said GA can find the global solution rapidly if the population holds sufficient varieties. However, this expectation has not been confirmed rigidly. Indeed, there are some problems pointed out such as the early convergence problem in functional optimization, and the encode/decode-crossover problem in combinatorial one. In this paper, we give a solution to the encode/decode-crossover problem for traveling salesman problems (TSP) with a character-preserving GA. In section 2, we define the encode/decode-crossover problem. The encode-decode problem is to define a correspondence between GA space and problem space. The crossover problem is to define a crossover method in GA space. They are closely related to the performance of GA. We point out some problems with conventional approaches for TSP. We propose three criteria to define better encode/decode ; the completeness, soundness and non-redundancy. We also propose a criterion to define better crossover ; character-preservingness. In section 3, we propose a character-preserving GA. In TSP, good subtours are worth preserving for descendants. We propose a subtour exchange crossover, that will not break subtours as possible. We also propose a compress method to improve efficiency. In section 4, we design an experiment to confirm usefulness of our character-preserving GA. We use a double-circled TSP in which the same numbers of cities are placed on two concentrated circles. There are two kinds of local solutions ; "C"-type and "O"-type. The ratio between outer and inner radius determines which is the optimum solution. We vary the radius ratio and see how much optimal solutions are obtained. In the result, character-preserving GA finds optimal solutions effectively.
著者
宮崎 和光 木村 元 小林 重信
出版者
社団法人人工知能学会
雑誌
人工知能学会誌 (ISSN:09128085)
巻号頁・発行日
vol.14, no.5, pp.800-807, 1999-09-01
被引用文献数
42

1・1 工学の視点からみた強化学習 強化学習とは, 報酬という特別な人力を手がかりに環境に適応した行動決定戦略を追求する機械学習システムである. 強化学習の重要な特徴に, 1)報酬駆動型学習であること, 2)環境に対する先見的知識を前提としないこと, の2点がある. このことは, 「何をして欲しいか(what)」という目標を報酬に反映させるだけで, 「その実現方法(how to)」を学習システムに獲得させることを意味する. 強化学習システムは, 人間が考えた以上の解を発見する可能性がある. 加えて, 環境の一部が予め既知な場合には, 知識を組み込むことも可能である. この場合, 知識ベースが不完全であってもあるいは多少の誤りが含まれていても構わない. また, 強化学習は, ニューロやファジィなどの既存の手法との親和性が高い. さらに, 緩やかな環境変化には追従可能である. これらの理由から, 強化学習は工学的応用の観点から非常に魅力的な枠組と言える.
著者
荒井 幸代 宮崎 和光 小林 重信
出版者
一般社団法人 人工知能学会
雑誌
人工知能 (ISSN:21882266)
巻号頁・発行日
vol.13, no.4, pp.609-618, 1998-07-01 (Released:2020-09-29)
被引用文献数
1

Most of multi-agent systems have been developed in the field of Distributed Artificial Intelligence (DAI) whose schemes are based on plenty of pre-knowledge of the agents' world or organized relationships among the agents. However, these kind of knowledge would not be always available. On the other hand, multi-agent reinforcement learning is worth considering to realize the cooperative behavior among the agents with littls pre-knowledge. There are two main problems to be considered in multi-agent reinforcement learning. One is the uncertainty of state transition problem which is owing to concurrent learning of the agents. And the other is the perceptual aliasing problem which is generally held in such a world. Therefore, the robustness and flexibility are essential for the multi-agent reinforcement learning toward these two problems. In this paper, we evaluate Q-learning and Profit Sharing as the method for multi-agent reinforcement learning through some experiments. We take up the Pursuit Problem as one of the multi-agent world. In the experiments, we do not assume the existence of any pre-defined relationship among agents or any control knowledge for cooperation. Learning agents do not share sensation, episodes and policies. Each agent learns through its own episodes independent of the others. The result of experiments shows that cooperative behaviors emerge clearly among Profit Sharing hunters who are not influenced by concurrent learning even when the prey has the certain escaping way against the hunters. Moreover, they behave rational under the perceptual aliasing areas. On the other hand, Q-learning hunters can not make any policy in such a world. Through these experiments, we conclude that Profit Sharing has the good properties for multi-agent reinforcement learning because or its rubustness for the change of other agents' policies and the limitation of agent's sensing abilities.
著者
宮崎 和光 山村 雅幸 小林 重信
出版者
一般社団法人 人工知能学会
雑誌
人工知能 (ISSN:21882266)
巻号頁・発行日
vol.12, no.1, pp.78-89, 1997-01-01 (Released:2020-09-29)

Reinforcement learning is a kind of machine learning. It aims to adapt an agent to a given environment with a clue to rewards. Profit sharing (PS) can get rewards efficiently at an initial learning phase. However, it can not always learn an optimum policy that maximizes rewards per an action. Though Q-learning is guaranteed to obtain an optimum policy, it needs numerous trials to learn it. On Markov decision processes (MDPs), if a correct environment model is identified, we can derive an optimum policy by applying Policy Iteration Algorithm (PIA). As an efficient method for identifying MDPs, k-Certainty Exploration Method has been proposed. We consider that ideal reinforcement learning systems are to get some rewards even at an initial learning phase and to get mere rewards as the identification of environments proceeds. In this paper, we propose a unified learning system : MarcoPolo which considers both getting rewards by PS or PIA and identifying the environment by k-Certainty Exploration Method. MarcoPolo can realize any tradeoff between exploitation and exploration through the whole learning process. By applying MarcoPolo to an example, its basic performance is shown. Moreover, by applying it to Sutton's maze problem and its modified version, its feasibility on more realistic domains is shown.
著者
池田 心 小林 重信
出版者
一般社団法人 人工知能学会
雑誌
人工知能学会論文誌 (ISSN:13460714)
巻号頁・発行日
vol.17, no.3, pp.239-246, 2002 (Released:2002-04-25)
参考文献数
19
被引用文献数
4 8

Genetic Algorithms(GAs) are effective approximation algorithms which focus on “hopeful area” in the searching process. However, in harder problems, it is often very difficult to maintain a favorable trade-off between exploitation and exploration. All individuals leave the big-valley including the global optimum, and concentrate on another big-valley including a local optimum often. In this paper, we define such a situation on conventional GAs as the “UV-phenomenon”, and suggest UV-structures as hard landscape structures that will cause the UV-phenomenon. We introduce a test function which has explicit UV-structures, and show UV-phenomenon caused by them. Next we analyze Fletcher and Powell function to confirm our hypothesis. Finally we propose a novel framework of GAs which can cope with UV-structures.
著者
上村 健人 木下 峻一 永田 裕一 小林 重信 小野 功
出版者
進化計算学会
雑誌
進化計算学会論文誌 (ISSN:21857385)
巻号頁・発行日
vol.4, no.1, pp.1-12, 2013 (Released:2013-03-02)
参考文献数
18
被引用文献数
1

This paper proposes a new framework of real-coded genetic algorithms (RCGAs) for the multi-funnel function optimization. The RCGA is one of the most powerful function optimization methods. Most conventional RCGAs work effectively on the single-funnel function that consists of a single big-valley. However, it is reported that they show poor performance or, sometimes, fail to find the optimum on the multi-funnel function that consists of multiple big-valleys. In order to remedy this deterioration, Innately Split Model (ISM) has been proposed as a framework of RCGAs. ISM initializes an RCGA in a small region and repeats a search with the RCGA as changing the position of the region randomly. ISM outperforms conventional RCGAs on the multi-funnel functions. However, ISM has two problems in terms of the search efficiency and the difficulty of setting parameters. Our proposed method, Big-valley Explorer (BE), is a framework of RCGAs like ISM and it has two novel mechanisms to overcome these problems, the big-valley estimation mechanism and the adaptive initialization mechanism. Once the RCGA finishes a search, the big-valley estimation mechanism estimates a big-valley that the RCGA already explored and removes the region from the search space to prevent the RCGA from searching the same big-valley many times. After that, the adaptive initialization mechanism initializes the RCGA in a wide unexplored region adaptively to find unexplored big-valleys. We evaluate BE through some numerical experiments with both single-funnel and multi-funnel benchmark functions.
著者
福島 信純 永田 裕一 小林 重信 小野 功
出版者
進化計算学会
雑誌
進化計算学会論文誌 (ISSN:21857385)
巻号頁・発行日
vol.4, no.2, pp.57-73, 2013 (Released:2013-09-11)
参考文献数
19

The natural evolution strategies (NESs) is a family of iterative methods for black-box function optimization. Instead of directly minimizing an objective function, NESs minimizes the expectation of the objective function value over an arbitrary parametric probability distribution. In each iteration, NESs updates parameters of the distribution by using an estimated natural gradient of the expectation of the objective function value. Exponential NES (xNES) is an effective method of NESs that uses the multivariate normal distribution as the probability distribution. Since the shape of a normal distribution can take the form of a rotated ellipse in the solution space, xNES shows relatively good performance for ill-conditioned and non-separable objective functions. However, we believe that xNES has two problems that cause performance degradation. The first problem is that the spread of normal distribution tends to shrink excessively even if the distribution does not cover a (local) optimal point. This will cause premature convergence. The second problem is that the learning rates for the parameters of distribution are not appropriate. The learning rates depend only on the dimension of objective function although they should be designed depending on all the factors that influence the precision of natural gradient estimation. Moreover, they are set to small values for preventing the premature convergence and these results in too slow convergence speed even if the distribution covers the optimal point. In order to remedy the problems of xNES, we propose a new method of NESs named the distance-weighted exponential natural evolution strategy (DX-NES). On several benchmark functions, we confirmed that DX-NES outperforms xNES and that DX-NES shows better performance than CMA-ES on the almost all functions.
著者
小林 重信
出版者
一般社団法人 人工知能学会
雑誌
人工知能学会論文誌 (ISSN:13460714)
巻号頁・発行日
vol.24, no.1, pp.147-162, 2009 (Released:2009-01-06)
参考文献数
37
被引用文献数
43 63

Real-coded genetic algorithms (RCGA) are expected to solve efficiently real parameter optimization problems of multimodality, parameter dependency, and ill-scale. Multi-parental crossovers such as the simplex crossover (SPX) and the UNDX-m as extensions of the unimodal normal distribution crossove (UNDX) show relatively good performance for RCGA. The minimal generation gap (MGG) is used widely as a generation alternation model for RCGA. However, the MGG is not suited for multi-parental crossovers. Both the SPX and the UNDX-m have their own drawbacks respectively. Therefore, RCGA composed of them cannot be applied to highly dimensional problems, because their hidden faults appear. This paper presents a new and robust faramework for RCGA. First, we propose a generation alternation model called JGG (just generation gap) suited for multi-parental crossovers. The JGG replaces parents with children completely every generation. To solve the asymmetry and bias of children distribution generated by the SPX and the UNDX-m, an enhanced SPX (e-SPX) and an enhanced UNDX (e-UNDX) are proposed. Moreover, we propose a crossover called REX(φ,n+k) as a generlization of the e-UNDX, where φ and n+k denote some probability distribution and the number of parents respectively. A concept of the globally descent direction (GDD) is introduced to handle the situations where the population does not cover any optimum. The GDD can be used under the big valley structure. Then, we propose REXstar as an extention of the REX(φ,n+k) that can generate children to the GDD efficiently. Several experiments show excellent performance and robustness of the REXstar. Finally, the future work is discussed.
著者
宮崎 和光 坪井 創吾 小林 重信
出版者
一般社団法人 人工知能学会
雑誌
人工知能学会論文誌 (ISSN:13460714)
巻号頁・発行日
vol.16, no.2, pp.185-192, 2001 (Released:2002-02-28)
参考文献数
10
被引用文献数
1 7

Reinforcement learning is a kind of machine learning. It aims to adapt an agent to a given environment with a clue to rewards. In general, the purpose of reinforcement learning system is to acquire an optimum policy that can maximize expected reward per an action. However, it is not always important for any environment. Especially, if we apply reinforcement learning system to engineering, environments, we expect the agent to avoid all penalties. In Markov Decision Processes, a pair of a sensory input and an action is called rule. We call a rule penalty if and only if it has a penalty or it can transit to a penalty state where it does not contribute to get any reward. After suppressing all penalty rules, we aim to make a rational policy whose expected reward per an action is larger than zero. In this paper, we propose a suppressing penalty algorithm that can suppress any penalty and get a reward constantly. By applying the algorithm to the tick-tack-toe, its effectiveness is shown.
著者
小林 重信
出版者
社団法人人工知能学会
雑誌
人工知能学会誌 (ISSN:09128085)
巻号頁・発行日
vol.18, no.4, pp.439-451, 2003-07-01
被引用文献数
7
著者
木村 元 山村 雅幸 小林 重信
出版者
社団法人人工知能学会
雑誌
人工知能学会誌 (ISSN:09128085)
巻号頁・発行日
vol.11, no.5, pp.761-768, 1996-09-01
被引用文献数
60

Many conventional works in reinforcement learning are limited to Markov decision processes (MDPs). However, real world decision tasks are essentially non-Markovian. In this paper, we consider reinforcement learning in partially observable MDPs(POMDPs) that is a class of non-Markovian decision problems. In POMDPs assumption, the environment is MDP, but an agent has restricted access to state information. Instead, the agent receives observation containing some information about states of the MDP. Also we focus on a learnig algorithm for memory-less stochastic policies that map the immediate observation of the agent into actions: The memory-less approaches are suited for on-line and real-time adaptive systems that have limited memory and computational resources. Then, the following mathematical results are got. First, it can improve its policy to maximize immediate reward by stochastic gradient ascent without estimating any state or immediate reward. Second, it can improve the policy to maximize discounted reward in an initial state by stochastic gradient ascent without estimating any state, immediate reward or discounted reward. The above advantages are remarkably effective in POMDPs, because it is not required to estimate any states, immediate reward or discounted reward explicitly. Making use of these results, we present an incremental policy improvement algorithm to maximize the average reward in POMDPs. We ensure the rational behavior of the proposed algorithm in a simple experiment.
著者
小林 重信
出版者
社団法人人工知能学会
雑誌
人工知能学会誌 (ISSN:09128085)
巻号頁・発行日
vol.24, no.1, pp.128-143, 2009-01-01
被引用文献数
10 63

Real-coded genetic algorithms (RCGA) are expected to solve efficiently real parameter optimization problems of multimodality, parameter dependency, and ill-scale. Multi-parental crossovers such as the simplex crossover (SPX) and the UNDX-m as extensions of the unimodal normal distribution crossover (UNDX) show relatively good performance for RCGA. The minimal generation gap (MGG) is used widely as a generation alternation model for RCGA. However, the MGG is not suited for multi-parental crossovers. Both the SPX and the UNDX-m have their own drawbacks respectively. Therefore, RCGA composed of them cannot be applied to highly dimensional problems, because their hidden faults appear. This paper presents a new and robust faramework for RCGA. First, we propose a generation alternation model called JGG (just generation gap) suited for multi-parental crossovers. The JGG replaces parents with children completely every generation. To solve the asymmetry and bias of children distribution generated by the SPX and the UNDX-m, an enhanced SPX (e-SPX) and an enhanced UNDX (e-UNDX) are proposed. Moreover, we propose a crossover called REX (φ, n+k) as a generlization of the e-UNDX, where φ and n+k denote some probability distribution and the number of parents respectively. A concept of the globally descent direction (GDD) is introduced to handle the situations where the population does not cover any optimum. The GDD can be used under the big valley structure. Then, we propose REX^<star> as an extention of the REX (φ, n+k) that can generate children to the GDD efficiently. Several experiments show excellent performance and robustness of the REX^<star>. Finally, the future work is discussed.