著者
Shahrzad Mahboubi Indrapriyadarsini S Hiroshi Ninomiya Hideki Asai
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
Nonlinear Theory and Its Applications, IEICE (ISSN:21854106)
巻号頁・発行日
vol.12, no.3, pp.554-574, 2021 (Released:2021-07-01)
参考文献数
33
被引用文献数
1 4

This paper describes a momentum acceleration technique for quasi-Newton (QN) based neural network training and verifies its performance and computational complexity. Recently, Nesterov's accelerated quasi-Newton method (NAQ) has been introduced and shown that the momentum term is effective in reducing the number of iterations and the total training time by incorporating Nesterov's accelerated gradient into QN. However, the gradients had to be calculated two times in one iteration in the NAQ training. This increased the computation time of a training loop compared with the conventional QN. The proposed technique is an improvement to NAQ done by approximating the Nesterov's accelerated gradient as a linear combination of the current and previous gradients. As a result, the gradient is calculated only once per iteration similar to that of QN. The performance of the proposed algorithm is evaluated in comparison to conventional algorithms in neural networks training on two types of problems - function approximation problems with high nonlinearity and classification problems. The results show a significant acceleration in the computation time without losing the quality of the solution compared with conventional training algorithms.
著者
Indrapriyadarsini Sendilkkumaar Shahrzad Mahboubi Hiroshi Ninomiya Hideki Asai
出版者
The Institute of Electronics, Information and Communication Engineers
雑誌
Nonlinear Theory and Its Applications, IEICE (ISSN:21854106)
巻号頁・発行日
vol.11, no.4, pp.409-421, 2020 (Released:2020-10-01)
参考文献数
30

Recurrent Neural Networks (RNNs) are powerful sequence models that are particularly difficult to train. This paper proposes an adaptive stochastic Nesterov's accelerated quasi-Newton (aSNAQ) method for training RNNs. Several algorithms have been proposed earlier for training RNNs. However, due to high computational complexity, very few methods use second-order curvature information despite its ability to improve convergence. The proposed method is an accelerated second-order method that attempts to incorporate curvature information while maintaining a low per iteration cost. Furthermore, direction normalization has been introduced to solve the vanishing and/or exploding gradient problem that is prominent in training RNNs. The performance of the proposed method is evaluated in Tensorflow on benchmark sequence modeling problems. The results show that the proposed aSNAQ method is effective in training RNNs with a low per-iteration cost and improved performance compared to the second-order adaQN and first-order Adagrad and Adam methods.