ModularEvoGym | Yuxing Wang (王昱兴)

ModularEvoGym [1] is based on Evolution Gym [2], a large-scale benchmark for co-optimizing the design and control of voxel-based soft robots (VSRs). We modified the original state representation to formulate a new modular observation space. The input state of the robot at time step $t$ is represented as $s_{t}^{c}=\lbrace s_{t}^{v},s_{t}^{g}\rbrace$, where $s_{t}^{v}=\lbrace s_{t}^{v_{1}}, s_{t}^{v_{2}},...,s_{t}^{v_N}\rbrace$. Here, $s_{t}^{v_i}$ consists of each voxel’s local information, including the relative positions of its four corners with respect to the robot’s center of mass and its material information (e.g., soft voxel, rigid voxel, horizontal actuator, or vertical actuator). $s_{t}^{g}$ contains task-related observations, such as terrain information and goal-relevant information. During simulation, voxels (except empty voxels) sense only locally. Based on the input sensory information, a controller outputs control signals to vary the volume of actuator voxels. The morphology of the robot remains fixed during its interaction with the environment.

With ModularEvoGym, we can explore many interesting directions. We modeled the local observations of all voxels as a sequence and adopted the self-attention mechanism [3] to develop a more efficient controller (shown below) that handles incompatible state-action spaces. This controller can be trained by popular reinforcement learning algorithms (e.g., PPO [4]) to simultaneously control a variety of robot morphologies.

To encode a VSR’s morphology, in addition to methods such as direct encoding and Compositional Pattern Producing Networks (CPPNs) [5], which rely on access to the whole design space, we provide another option: Neural Cellular Automata (NCA) [6]. As shown below, NCA takes multiple actions to grow a robot from an initial seed morphology. It encodes complex patterns in a neural network and generates different developmental outcomes while using a smaller set of trainable parameters.

Building on the aforementioned results, we also presented an efficient curriculum-based co-design (CuCo) method for learning to design and control VSRs through an easy-to-difficult process [7]. The following figures show the developmental process of a walker agent.

$3 \times 3$ design space, $9$ voxels.

$5 \times 5$ design space, $25$ voxels.

$7 \times 7$ design space, $49$ voxels.

$9 \times 9$ design space, $81$ voxels.

$11 \times 11$ design space, $121$ voxels.

References

[1] ModularEvoGym
[2] Evolution Gym: A Large-Scale Benchmark for Evolving Soft Robots
[3] Attention Is All You Need
[4] Proximal Policy Optimization Algorithms
[5] Compositional Pattern Producing Network
[6] Neural Cellular Automata
[7] Curriculum-based Co-design of Morphology and Control of Voxel-based Soft Robots