A Surrogate-Assisted Controller for Expensive Evolutionary Reinforcement Learning

Yuxing Wang, Tiantian Zhang, Yongzhe Chang, Bin Liang, Xueqian Wang, Bo Yuan

Abstract: The integration of Reinforcement Learning (RL) and Evolutionary Algorithms (EAs) aims at simultaneously exploiting the sample efficiency as well as the diversity and robustness of the two paradigms. Recently, hybrid learning frameworks based on this principle have achieved great success in robot control tasks. However, in these methods, policies from the genetic population are evaluated via interactions with the real environments, severely restricting their applicability when such interactions are prohibitively costly. In this work, we propose Surrogate-assisted Controller (SC), a generic module that can be applied on top of existing hybrid frameworks to alleviate the computational burden of expensive fitness evaluation. At the heart of SC is a novel surrogate model based on the critic network in RL, which efficiently leverages historical interaction data generated by the population and makes it possible to estimate the fitness of individuals without environmental interactions. In addition, two model management strategies with the elite protection mechanism are introduced in SC to control the workflow, leading to a fast and stable optimization process. In the empirical studies, we combine SC with two state-of-the-art evolutionary reinforcement learning approaches to highlight its functionality and effectiveness. Experiments on six challenging continuous control benchmarks from the OpenAI Gym platform show that SC can not only significantly reduce the cost of interaction with the environment, but also bring better sample efficiency and dramatically boost the learning progress of the original hybrid framework.

📂 Download paper here!
👉 Code is available here!

Bibtex

@article{wang2022surrogate,
  title={A surrogate-assisted controller for expensive evolutionary reinforcement learning},
  author={Wang, Yuxing and Zhang, Tiantian and Chang, Yongzhe and Wang, Xueqian and Liang, Bin and Yuan, Bo},
  journal={Information Sciences},
  volume={616},
  pages={539--557},
  year={2022},
  publisher={Elsevier}
}

🎦Video

A HalfCheetah agent trained by SPDERL-I with average performance of 14000 points over 50 test seeds.

A Hopper agent trained by SPDERL-I with average performance of 4100 points over 50 test seeds.

A Walker agent trained by SPDERL-I with average performance of 9000 points over 50 test seeds.