文献一覧: 宮崎和光 (著者)

2 0 0 0 学位に付記する専攻分野の名称とディプロマ・ポリシーの整合性に関する研究

著者: 宮崎和光吉田望森利枝
出版者: 一般社団法人電気学会
雑誌: 電気学会論文誌C(電子・情報・システム部門誌) (ISSN:03854221)
巻号頁・発行日: vol.142, no.2, pp.117-128, 2022-02-01 (Released:2022-02-01)
参考文献数: 32

In 2017, it became mandatory for universities in Japan to disclose their policies in degree granting (Diploma Policy: DP, hereafter) that state standards to confer degrees. Meanwhile, since 1991, nomenclature of major fields that appear in diplomas has been the responsibility of individual universities, instead of the national regulation. This study examines whether the former reasonably evokes the latter, given that both of them are deemed to represent the learning outcomes that the graduate has obtained. In order to do so, we compared the ability of humans and that of a deep-learning system (which is based on the Character-level CNN), to match DPs and major fields that are randomly given. In the examination of human ability, which was implemented with a large enough number of participants to obtain statistically significant results, we found there were a certain number of DPs that the majority of people failed to match with major fields. Given this fact, we analyzed such DPs to demonstrate that the deep learning system shows a high success rate in sorting out the DPs that poorly evoke major fields.

2 0 0 0 OA 専門科目名のリストを利用した学位授与事業のための科目分類支援システムの評価

著者: 宮崎和光井田正明芳鐘冬樹野澤孝之喜多一
出版者: 独立行政法人大学評価・学位授与機構
雑誌: 大学評価・学位研究 = RESEARCH ON ACADEMIC DEGREES AND UNIVERSITY EVALUATION (ISSN:18800343)
巻号頁・発行日: no.6, pp.27-42, 2007-12

2021-03-03 21:00:26
2 + 1 Twitter

http://id.nii.ac.jp/1107/00000067/

2 0 0 0 Profit Sharing に基づく強化学習の理論と応用 (<特集>計算学習理論の進展と応用可能性)

著者: 宮崎和光木村元小林重信
出版者: 社団法人人工知能学会
雑誌: 人工知能学会誌 (ISSN:09128085)
巻号頁・発行日: vol.14, no.5, pp.800-807, 1999-09-01
被引用文献数: 42

1・1 工学の視点からみた強化学習強化学習とは, 報酬という特別な人力を手がかりに環境に適応した行動決定戦略を追求する機械学習システムである. 強化学習の重要な特徴に, 1)報酬駆動型学習であること, 2)環境に対する先見的知識を前提としないこと, の2点がある. このことは, 「何をして欲しいか(what)」という目標を報酬に反映させるだけで, 「その実現方法(how to)」を学習システムに獲得させることを意味する. 強化学習システムは, 人間が考えた以上の解を発見する可能性がある. 加えて, 環境の一部が予め既知な場合には, 知識を組み込むことも可能である. この場合, 知識ベースが不完全であってもあるいは多少の誤りが含まれていても構わない. また, 強化学習は, ニューロやファジィなどの既存の手法との親和性が高い. さらに, 緩やかな環境変化には追従可能である. これらの理由から, 強化学習は工学的応用の観点から非常に魅力的な枠組と言える.

2009-11-04 19:06:17
2 + 0 Twitter

https://ci.nii.ac.jp/naid/110002808748

1 0 0 0 OA マルチエージェント強化学習の方法論 : Q-LearningとProfit Sharingによる接近

著者: 荒井幸代宮崎和光小林重信
出版者: 一般社団法人人工知能学会
雑誌: 人工知能 (ISSN:21882266)
巻号頁・発行日: vol.13, no.4, pp.609-618, 1998-07-01 (Released:2020-09-29)
被引用文献数: 1

Most of multi-agent systems have been developed in the field of Distributed Artificial Intelligence (DAI) whose schemes are based on plenty of pre-knowledge of the agents' world or organized relationships among the agents. However, these kind of knowledge would not be always available. On the other hand, multi-agent reinforcement learning is worth considering to realize the cooperative behavior among the agents with littls pre-knowledge. There are two main problems to be considered in multi-agent reinforcement learning. One is the uncertainty of state transition problem which is owing to concurrent learning of the agents. And the other is the perceptual aliasing problem which is generally held in such a world. Therefore, the robustness and flexibility are essential for the multi-agent reinforcement learning toward these two problems. In this paper, we evaluate Q-learning and Profit Sharing as the method for multi-agent reinforcement learning through some experiments. We take up the Pursuit Problem as one of the multi-agent world. In the experiments, we do not assume the existence of any pre-defined relationship among agents or any control knowledge for cooperation. Learning agents do not share sensation, episodes and policies. Each agent learns through its own episodes independent of the others. The result of experiments shows that cooperative behaviors emerge clearly among Profit Sharing hunters who are not influenced by concurrent learning even when the prey has the certain escaping way against the hunters. Moreover, they behave rational under the perceptual aliasing areas. On the other hand, Q-learning hunters can not make any policy in such a world. Through these experiments, we conclude that Profit Sharing has the good properties for multi-agent reinforcement learning because or its rubustness for the change of other agents' policies and the limitation of agent's sensing abilities.

2022-08-26 15:59:00
1 + 0 Twitter

1 0 0 0 OA MarcoPolo : 報酬獲得と環境同定のトレードオフを考慮した強化学習システム

著者: 宮崎和光山村雅幸小林重信
出版者: 一般社団法人人工知能学会
雑誌: 人工知能 (ISSN:21882266)
巻号頁・発行日: vol.12, no.1, pp.78-89, 1997-01-01 (Released:2020-09-29)

Reinforcement learning is a kind of machine learning. It aims to adapt an agent to a given environment with a clue to rewards. Profit sharing (PS) can get rewards efficiently at an initial learning phase. However, it can not always learn an optimum policy that maximizes rewards per an action. Though Q-learning is guaranteed to obtain an optimum policy, it needs numerous trials to learn it. On Markov decision processes (MDPs), if a correct environment model is identified, we can derive an optimum policy by applying Policy Iteration Algorithm (PIA). As an efficient method for identifying MDPs, k-Certainty Exploration Method has been proposed. We consider that ideal reinforcement learning systems are to get some rewards even at an initial learning phase and to get mere rewards as the identification of environments proceeds. In this paper, we propose a unified learning system : MarcoPolo which considers both getting rewards by PS or PIA and identifying the environment by k-Certainty Exploration Method. MarcoPolo can realize any tradeoff between exploitation and exploration through the whole learning process. By applying MarcoPolo to an example, its basic performance is shown. Moreover, by applying it to Sutton's maze problem and its modified version, its feasibility on more realistic domains is shown.

2022-02-22 18:39:07
1 + 0 Twitter

1 0 0 0 学位に付記する専攻分野の名称とディプロマ・ポリシーの整合性に関する研究

著者: 宮崎和光吉田望森利枝
出版者: 一般社団法人電気学会
雑誌: 電気学会論文誌C(電子・情報・システム部門誌) (ISSN:03854221)
巻号頁・発行日: vol.142, no.2, pp.117-128, 2022

<p>In 2017, it became mandatory for universities in Japan to disclose their policies in degree granting (Diploma Policy: DP, hereafter) that state standards to confer degrees. Meanwhile, since 1991, nomenclature of major fields that appear in diplomas has been the responsibility of individual universities, instead of the national regulation. This study examines whether the former reasonably evokes the latter, given that both of them are deemed to represent the learning outcomes that the graduate has obtained. In order to do so, we compared the ability of humans and that of a deep-learning system (which is based on the Character-level CNN), to match DPs and major fields that are randomly given. In the examination of human ability, which was implemented with a large enough number of participants to obtain statistically significant results, we found there were a certain number of DPs that the majority of people failed to match with major fields. Given this fact, we analyzed such DPs to demonstrate that the deep learning system shows a high success rate in sorting out the DPs that poorly evoke major fields.</p>

2022-02-08 23:35:03
1 + 0 Twitter

1 0 0 0 テキストマイニング応用の進展 -学位授与事業におけるシラバス分類-

著者: 宮崎和光井田正明
出版者: 日本知能情報ファジィ学会
雑誌: 知能と情報 (ISSN:13477986)
巻号頁・発行日: vol.26, no.2, pp.42-50, 2014

2021-12-05 17:00:34
1 + 0 Twitter

1 0 0 0 OA シラバスデータのクラスタリングに基づく教育コース分析システムの構築

著者: 野澤孝之井田正明芳鐘冬樹宮崎和光喜多一
雑誌: 第66回全国大会講演論文集
巻号頁・発行日: vol.2004, no.1, pp.377, 2004-03-09

2019-04-06 23:48:29
1 + 0 Twitter

http://id.nii.ac.jp/1001/00170396/

1 0 0 0 OA 電子化シラバスに基づく学位授与のための科目分類支援システムの検討

著者: 宮崎和光井田正明芳鐘冬樹野澤孝之喜多一
出版者: 大学評価・学位授与機構
雑誌: 学位研究 = RESEARCH IN ACADEMIC DEGREES (ISSN:09196099)
巻号頁・発行日: no.18, pp.133-150, 2004-03

2019-02-18 17:15:36
1 + 0 Twitter

http://id.nii.ac.jp/1107/00000225/

1 0 0 0 OA 罰を回避する合理的政策の学習

著者: 宮崎和光坪井創吾小林重信
出版者: 一般社団法人人工知能学会
雑誌: 人工知能学会論文誌 (ISSN:13460714)
巻号頁・発行日: vol.16, no.2, pp.185-192, 2001 (Released:2002-02-28)
参考文献数: 10
被引用文献数: 1 7

Reinforcement learning is a kind of machine learning. It aims to adapt an agent to a given environment with a clue to rewards. In general, the purpose of reinforcement learning system is to acquire an optimum policy that can maximize expected reward per an action. However, it is not always important for any environment. Especially, if we apply reinforcement learning system to engineering, environments, we expect the agent to avoid all penalties. In Markov Decision Processes, a pair of a sensory input and an action is called rule. We call a rule penalty if and only if it has a penalty or it can transit to a penalty state where it does not contribute to get any reward. After suppressing all penalty rules, we aim to make a rational policy whose expected reward per an action is larger than zero. In this paper, we propose a suppressing penalty algorithm that can suppress any penalty and get a reward constantly. By applying the algorithm to the tick-tack-toe, its effectiveness is shown.

2015-05-17 19:24:02
1 + 0 Twitter