HyperAI

Backpropagation/BP

Definition of Backpropagation

Backpropagation, short for "error back propagation", is a common method used in conjunction with optimization methods to train artificial neural networks. This method calculates the gradient of the loss function for all weights in the network.

This gradient is fed back to the optimization method to update the weights to minimize the loss function.

Backpropagation requires a known output for each input value to calculate the gradient of the loss function. It is a generalization of the Delta rule for multi-layer feedforward networks, and the chain rule can be used to iteratively calculate the gradient for each layer. Backpropagation requires that the activation function of the artificial neuron (or "node") is differentiable.

Back propagation phase

The back-propagation algorithm mainly consists of two stages: incentive propagation and weight update.

Phase 1: Incentivizing Dissemination

The propagation phase in each iteration consists of two steps:

(Forward propagation phase) Feed the training input into the network to obtain the stimulus response;

(Back propagation phase) Subtract the stimulus response from the target output corresponding to the training input to obtain the response error of the output layer and the hidden layer.

Phase 2: Weight Update

For each synaptic weight, the update is performed as follows:

Multiply the input stimulus and the response error to obtain the gradient of the weight;

Multiply this gradient by a scale, invert it, and add it to the weight.

This ratio will affect the speed and effect of the training process. The direction of the gradient indicates the direction in which the error expands. Therefore, it needs to be negated when updating the weights to reduce the error caused by the weights.

Stages 1 and 2 can be iterated repeatedly until the network's response to the input reaches a satisfactory predetermined target range.

Limitations of Backpropagation

The result may converge to an extreme value. If there is only one minimum value, the "hill climbing" strategy of gradient descent will definitely work;

Gradient descent can find local minima instead of global minima;

Convergence obtained from back-propagation learning is slow;

The convergence of back-propagation learning is not guaranteed;

Back-propagation learning does not require normalization of the input vectors; however, normalization can improve performance.