Path Planning Based on Deep Q-Network with Experience Replay and Simulated Annealing Strategy
Aiming at the shortcomings of the pure Q-Learning and SARSA methods in the convergence rate and local optimization, combining neural network and reinforcement learning to solve this problem. Experience replay is used to break the correlation between the collected data, so that the training results are convergent and stable. In addition, using the simulated annealing method to replace the traditional epsilon greedy strategy to balance the relationship between exploration and utilization in path planning. The result of experiment shows that ES-DQN has faster speed than traditional methods. At the same time, it could find best path.