著者
Makoto Shibata Megumi Uchida Setsuki Tsukagoshi Koichi Yamaguchi Aya Yamaguchi Natsumi Furuta Kouki Makioka Toshitaka Maeno Yukio Fujita Masahiko Kurabayashi Yoshio Ikeda
出版者
The Japanese Society of Internal Medicine
雑誌
Internal Medicine (ISSN:09182918)
巻号頁・発行日
vol.54, no.23, pp.3057-3060, 2015 (Released:2015-12-01)
参考文献数
13
被引用文献数
5 9

A 64-year-old Japanese woman presented with a three-month history of progressive numbness and weakness of the lower extremities. A neurological examination and nerve conduction study indicated sensorimotor polyneuropathy. Since the serum anti-Hu antibody titer was remarkably elevated, paraneoplastic neurological syndrome was highly suspected. A thoracoscopic biopsy of the hilar lymph nodes, in which 18F-fluorodeoxyglucose uptake was obviously increased, revealed pathological findings for small-cell lung cancer (SCLC). Subsequently, the patient presented with generalized tonic-clonic seizures, and cerebral MRI showed reversible multifocal brain lesions, considered to reflect paraneoplastic encephalopathy. After two courses of chemotherapy for SCLC, the brain lesions totally disappeared.
著者
praveen singh Thakur Masaru Sogabe Katsuyoshi Sakamoto Koichi Yamaguchi Dinesh Bahadur Malla Shinji Yokogawa Tomah Sogabe
出版者
人工知能学会
雑誌
2018年度人工知能学会全国大会(第32回)
巻号頁・発行日
2018-04-12

In this paper, for stable learning and faster convergence in Reinforcement learning continuous action tasks, we propose an alternative way of updating the actor (policy) in Deep Deterministic Policy Gradient (DDPG) algorithm. In our proposed Hybrid-DDPG (shortly H-DDPG), at one time step actor is updated similar to DDPG and another time step, policy parameters are moved based on TD-error of critic. Once among 5 trial runs on RoboschoolInvertedPendulumSwingup-v1 environment, reward obtained at the early stage of training in H-DDPG is higher than DDPG. In Hybrid update, the policy gradients are weighted by TD-error. This results in 1) higher reward than DDPG 2) pushes the policy parameters to move in a direction such that the actions with higher reward likely to occur more than the other. This implies if the policy explores at early stages good rewards, the policy may converge quickly otherwise vice versa. However, among the remaining trial runs, H-DDPG performed same as DDPG.