Neural Network Training with Levenberg-Marquardt and Adaptable Weight Compression

Abstract

Difficult experiments in training neural networks often fail to converge due to what is known as the flat spot problem, where the gradient of hidden neurons in the network diminish in value, rending the weight update process ineffective. Whereas a first-order algorithm can address this issue by learning parameters to normalize neuron activations, second-order algorithms cannot afford additional parameters given that they include a large Jacobian matrix calculation. This paper proposes Levenberg-Marquardt with Weight Compression (LM-WC), which combats the flat spot problem by compressing neuron weights to push neuron activation out of the saturated region and close to the linear region. The presented algorithm requires no additional learned parameters and contains an adaptable compression parameter, which is adjusted to avoid training failure and increase the probability of neural network convergence. Several experiments are presented and discussed to demonstrate the success of LM-WC against standard Levenberg-Marquardt (LM) and LM with random restarts on benchmark datasets for varying network architectures. Our results suggest that the LM-WC algorithm can improve training success by ten times or more compared to other methods.

Publication
IEEE Transactions on Neural Networks and Learning Systems
Date