HyperAI超神经

Behavior-Criticism Algorithm Actor-Critic Algorithm is a reinforcement learning algorithm that combines a policy network and a value function to calculate the probability of different actions being taken under different states through the reward and punishment information of the results. It is also called the AC algorithm.

The behavior-critic algorithm designs two neural networks, each time updating the parameters in a continuous state, and there is a correlation before and after each parameter update. Compared with the traditional policy network, it has better learning efficiency and performance, but it is prone to bias and can only produce local optimal solutions.

AC Algorithm Advantages

Better convergence
Higher dimensions and continuous action spaces work better
Stochastic strategy can be used

Disadvantages of AC algorithm

Usually the local optimal solution is obtained
Evaluation strategies are inefficient and have high bias

Actor-critic Algorithm

AC Algorithm Advantages

Disadvantages of AC algorithm