Human-AI Shared Control via Policy Dissection

Neural Information Processing Systems (NeurIPS) 2022

Quanyi Li1Zhenghao Peng3,   Haibin Wu1, Lan Feng2, Bolei Zhou3 
1Centre for Perceptual and Interactive Intelligence, 2ETH Zurich,
3University of California, Los Angeles
Webpage | Code | Video | Paper

Fig. 1 Overview of the proposed method

Inspired by the neuroscience approach to investigate the motor cortex in primates1, we develop a simple yet effective frequency-based approach called Policy Dissection to align the intermediate representation of the learned neural controller with the kinematic attributes of the agent behavior. Without modifying the neural controller or retraining the model, the proposed approach can convert a given RL-trained policy into a goal-conditioned policy, where specific units can be activated to evoke desired behaviors and complete goals. This, in turn, enables Human-AI shared control where human can control the trained AI and finish complex tasks.

Demo Video
We provide a demo video to show human-AI shared control systems empowered by Policy Dissection can be built on various tasks, including quadrupedal robot locomotion, autonomous driving and classic gym tasks.
Parkour Demo

We trained bipedal robots, Cassie, in IsaacGym . Though this robot is trained to move forward, activating the identified primitives can evoke complex behaviors like crouching, forward jumping, back-flipping. In the following parkour demo, we show that with human instruction, the robot can combine these skills and overcome complex situations. This neural controller with probed primitives and the shared control interface are all released and can be accessed at Code .

Comparison with goal-conditioned controller

To quantify the coarseness of the goal-conditioned control enabled by Policy Dissection, a explicit goal-conditioned controller following a target yaw is trained in IsaacGym . We directly identify the primitives related to yaw rate in this controller with the proposed method, and employ a PID controller to decide the output of the neuron related to yaw rate. This enables a new way, neural primitive activation, to track target command besides explicitly indicating the goal in the network input.

As a result, experiments can be conducted to fairly compare these two methods and quantify the coarseness of Policy Dissection enabled goal-conditioned control. The results show that the goal-conditioned control achieved by our method is compatible to the explict goal-conditioned control.

Note: the explict target yaw command in the observation is set to 0 for the primitive activation command tracking.

Reference
 @article{li2022human,
  title={Human-AI Shared Control via Policy Dissection},
  author={Li, Quanyi and Peng, Zhenghao and Wu, Haibin and Feng, Lan and Zhou, Bolei},
  journal={arXiv preprint arXiv:2206.00152},
  year={2022}}