HyperAI超神经

Exploding Gradients ProblemThis usually occurs in deep networks when the weight initialization value is too large, and it generally becomes more obvious as the number of network layers increases.

By taking the derivative of the activation function, if the result is greater than 1, then as the number of layers increases, the final gradient update will increase exponentially, that is, gradient explosion occurs; if the result is less than 1, then as the number of layers increases, the final gradient update will decay exponentially, that is, gradient disappearance occurs.

The main reasons for gradient explosion and gradient vanishing are that the network is too deep and the network weight update is unstable. Essentially, it is because there is a multiplication effect in the gradient back propagation. For the gradient vanishing problem, you can consider replacing the Sigmoid activation function with the ReLU activation function. In addition, the LSTM structure design can also improve the gradient vanishing problem in RNN.

Solutions to Exploding Gradients

Pre-training plus fine-tuning
Gradient clipping, weight regularization
Using different activation functions
Using Batchnorm
Using residual structure
Using LSTM Network

References

【1】Vanishing and Exploding Gradients in Neural Network Training

【2】Gradient instability problem of deep neural network – gradient vanishing and gradient exploding

【3】Detailed explanation of the causes and solutions of gradient disappearance and explosion in machine learning

Exploding Gradient Problem

Solutions to Exploding Gradients

References