praveen singh Thakur Masaru Sogabe Katsuyoshi Sakamoto Koichi Yamaguchi Dinesh Bahadur Malla Shinji Yokogawa Tomah Sogabe

In this paper, for stable learning and faster convergence in Reinforcement learning continuous action tasks, we propose an alternative way of updating the actor (policy) in Deep Deterministic Policy Gradient (DDPG) algorithm. In our proposed Hybrid-DDPG (shortly H-DDPG), at one time step actor is updated similar to DDPG and another time step, policy parameters are moved based on TD-error of critic. Once among 5 trial runs on RoboschoolInvertedPendulumSwingup-v1 environment, reward obtained at the early stage of training in H-DDPG is higher than DDPG. In Hybrid update, the policy gradients are weighted by TD-error. This results in 1) higher reward than DDPG 2) pushes the policy parameters to move in a direction such that the actions with higher reward likely to occur more than the other. This implies if the policy explores at early stages good rewards, the policy may converge quickly otherwise vice versa. However, among the remaining trial runs, H-DDPG performed same as DDPG.
Rei Hobara Shinya Yoshimoto Shuji Hasegawa Katsuyoshi Sakamoto
The Japan Society of Vacuum and Surface Science
e-Journal of Surface Science and Nanotechnology (ISSN:13480391)
vol.5, pp.94-98, 2007-04-30 (Released:2007-04-30)
30 38

We present a method to prepare tungsten tips for use in multi-tip scanning tunneling microscopes. The motivation behind the development comes from a requirement to make very long and conical-shape tips with controlling the cone angle. The method is based on a combination of a “drop-off” method and dynamic electrochemical etching, in which the tip is continuously and slowly drawn up from the electrolyte during etching. Its reproducibility was confirmed by scanning electron microscopy. Comparison in tip shape between the dynamic and static methods was shown. [DOI: 10.1380/ejssnt.2007.94]