HyperAIHyperAI

Command Palette

Search for a command to run...

Exploding Gradient Problem

Date

2 years ago

Exploding Gradients ProblemThis usually occurs in deep networks when the weight initialization value is too large, and it generally becomes more obvious as the number of network layers increases.

By taking the derivative of the activation function, if the result is greater than 1, then as the number of layers increases, the final gradient update will increase exponentially, that is, gradient explosion occurs; if the result is less than 1, then as the number of layers increases, the final gradient update will decay exponentially, that is, gradient disappearance occurs.

The main reasons for gradient explosion and gradient vanishing are that the network is too deep and the network weight update is unstable. Essentially, it is because there is a multiplication effect in the gradient back propagation. For the gradient vanishing problem, you can consider replacing the Sigmoid activation function with the ReLU activation function. In addition, the LSTM structure design can also improve the gradient vanishing problem in RNN.

Solutions to Exploding Gradients

  • Pre-training plus fine-tuning
  • Gradient clipping, weight regularization
  • Using different activation functions
  • Using Batchnorm
  • Using residual structure
  • Using LSTM Network

References

【1】Vanishing and Exploding Gradients in Neural Network Training

【2】Gradient instability problem of deep neural network – gradient vanishing and gradient exploding

【3】Detailed explanation of the causes and solutions of gradient disappearance and explosion in machine learning

Build AI with AI

From idea to launch — accelerate your AI development with free AI co-coding, out-of-the-box environment and best price of GPUs.

AI Co-coding
Ready-to-use GPUs
Best Pricing
Get Started

Hyper Newsletters

Subscribe to our latest updates
We will deliver the latest updates of the week to your inbox at nine o'clock every Monday morning
Powered by MailChimp
Exploding Gradient Problem | Wiki | HyperAI