HyperAI

Actor-critic Algorithm

Behavior-Criticism Algorithm Actor-Critic Algorithm is a reinforcement learning algorithm that combines a policy network and a value function to calculate the probability of different actions being taken under different states through the reward and punishment information of the results. It is also called the AC algorithm.

The behavior-critic algorithm designs two neural networks, each time updating the parameters in a continuous state, and there is a correlation before and after each parameter update. Compared with the traditional policy network, it has better learning efficiency and performance, but it is prone to bias and can only produce local optimal solutions.

AC Algorithm Advantages

  • Better convergence
  • Higher dimensions and continuous action spaces work better
  • Stochastic strategy can be used

Disadvantages of AC algorithm

  • Usually the local optimal solution is obtained
  • Evaluation strategies are inefficient and have high bias