ANALYSIS OF THE ERROR FUNCTION IN CASE OF APPLYING THE AMSGrad OPTIMIZATION ALGORITHM

Serhiy Sveleba, I. Katerynchuk, I. Kuno, O. Semotyuk, Ya. Shmygelsky, S. Velgosh, A. Kopach, V. Stakhura

Abstract


In this paper, the AMSGrad stochastic optimization method was tested using the logistic function that describes the doubling process and the Fourier spectra of the error function.

The implementation of the gradient descent optimization algorithm using the AMSGrad method was carried out for a simple two-dimensional function that squares the input data of each measurement and determines the range of acceptable input data from -1.0 to 1.0. The program for minimizing the error function was implemented using Python programming language.

The influence of hyperparameters' values beta1, beta2, and the learning rate on optimizing the training of this system was analyzed. Also, branching diagrams from these parameters were constructed. 

We found that the retraining process is accompanied by a change in the rate of the target error function, and the Fourier spectra are characterized by the harmonics' appearance. The instability in the learning process caused by the retraining process is shown to be observed in a small set of input data when the value of beta2 is close to 1 and beta1 = 0.9. Also, we found that a transition accompanies overtraining through the global minimum of the objective function, and multiple passages accompany the transition to chaos through the global minimum.

Keywords: optimization methods, error function, AMSGrad, learning rate, branching diagrams.


References


  1. Mykel J. Kochenderfer, Tim A. Wheeler Algorithms for Optimization/ Mykel J. Kochenderfer, Tim A. Wheeler – The MIT Press.– 2019. – 520p.
  2. Jason Brownlee. Optimization for Machine Learning. Finding Function Optima with Python – The MIT Press. – 2021. – 403 p.
  3. Diederik P. Adam: a method for stochastic optimization / Diederik P. Kingma, Jimmy Lei Ba – Published as a conference paper at ICLR 2015. – 2015. – P. 1-15. DOI: 10.48550/arXiv.1904.09237
  4. Sashank J.On the Convergence of Adam and Beyond /Sashank J. Reddi, Satyen Kale, Sanjiv Kumar .– Published as a conference paper at ICLR 2018. – 2019. – P. 1-23.
  5. Yu. Taranenko Information entropy of chaos. URL: https://habr.com/ru/post/447874/
  6. Tieleman, T. Lecture 6.5-rmsprop: Divide the Gradient by a Running Average of Its Recent Magnitude. / Tieleman, T. and Hinton, G. COURSERA: Neural Networks for Machine Learning. – 2012. – V.4. – P.26-31.
  7. Kingma, Diederik P Auto-Encoding Variational Bayes / Kingma, Diederik P, Welling, Max. – In The 2nd International Conference on Learning Representations (ICLR) – 2013. – V.11. – P. 1-14. DOI: 10.48550/arXiv.1312.6114




DOI: http://dx.doi.org/10.30970/eli.23.6

Refbacks

  • There are currently no refbacks.