Hierarchical state extraction method for multi-agent environment

Authors

  • ZHANG Yu-jie, CHEN He-chang, YANG Bo

Abstract

Many real-world problems can be naturally modeled as multi-agent reinforcement learning (MARL) systems. It is often the case that independent agents must jointly learn to perform a cooperative task. In the existing MARL methods, there are problems of policy divergence and redundancy of shared information. This article describes the LCDDPG method based on reinforcement learning. The method proposes a semi-centralized leader structure and a state extraction structure named Abs. The leader structure coordinates the team strategy by sharing observation information both on the evaluation side and the policy side, thus narrows the policy divergence; the Abs structure adds different weights to each dimension of the shared information, and hierarchically extracts the high-level representation of shared information to reduce redundancy. The results of the comparative experiment and ablations show that the leader structure and Abs can improve the effectiveness of algorithm independently, and LCDDPG is able to obtain higher average team reward and lower reward variance ratio than MADDPG.

Published

2021-01-21

Issue

Section

Articles