HyperAI

Exploding Gradient Problem

Exploding Gradients ProblemThis usually occurs in deep networks when the weight initialization value is too large, and it generally becomes more obvious as the number of network layers increases.

By taking the derivative of the activation function, if the result is greater than 1, then as the number of layers increases, the final gradient update will increase exponentially, that is, gradient explosion occurs; if the result is less than 1, then as the number of layers increases, the final gradient update will decay exponentially, that is, gradient disappearance occurs.

The main reasons for gradient explosion and gradient vanishing are that the network is too deep and the network weight update is unstable. Essentially, it is because there is a multiplication effect in the gradient back propagation. For the gradient vanishing problem, you can consider replacing the Sigmoid activation function with the ReLU activation function. In addition, the LSTM structure design can also improve the gradient vanishing problem in RNN.

Solutions to Exploding Gradients

  • Pre-training plus fine-tuning
  • Gradient clipping, weight regularization
  • Using different activation functions
  • Using Batchnorm
  • Using residual structure
  • Using LSTM Network

References

【1】Vanishing and Exploding Gradients in Neural Network Training

【2】Gradient instability problem of deep neural network – gradient vanishing and gradient exploding

【3】Detailed explanation of the causes and solutions of gradient disappearance and explosion in machine learning