Multiagent Inductive Policy Optimization

Published in IEEE Transactions on Automation Science and Engineering, 2025

Recommended citation: Huang Y, Zhao X. Multiagent Inductive Policy Optimization[J]. IEEE Transactions on Neural Networks and Learning Systems, 2025. https://ieeexplore.ieee.org/abstract/document/11153797

Policy optimization methods are promising to tackle high-complexity reinforcement learning (RL) tasks with multiple agents. In this article, we derive a general trust region for policy optimization methods by considering the effect of subpolicy combinations among agents in multiagent environments. Based on this trust region, we propose an inductive objective to train the policy function, which can ensure agents learn monotonically improving policies. Furthermore, we observe that the policy always updates very weakly before falling into a local optimum. To address this, we introduce a cost regarding policy distance in the inductive objective to strengthen the motivation of agents to explore new policies. This approach strikes a balance during training, where the policy update step size remains within the constraints of the trust region, preventing excessive updates while avoiding getting stuck in local optima. Simulations on wind farm (WF) control tasks and two multiagent benchmarks demonstrate the high performance of the proposed multiagent inductive policy optimization (MAIPO) method.

Download paper here

If you are interested in this paper, please cite it as:

@article{huang2025multiagent,
  title={Multiagent Inductive Policy Optimization},
  author={Huang, Yubo and Zhao, Xiaowei},
  journal={IEEE Transactions on Neural Networks and Learning Systems},
  year={2025},
  publisher={IEEE}
}