Human-AI Shared Control via Policy Dissection

Neural Information Processing Systems (NeurIPS) 2022

Quanyi Li1,4Zhenghao Peng3,   Haibin Wu1, Lan Feng2, Bolei Zhou3 
1Centre for Perceptual and Interactive Intelligence, 2ETH Zurich,
3University of California, Los Angeles 4University of Edinburgh
Webpage | Code | Video | Paper
Method Overview

Fig. 1 Overview of the proposed method

Inspired by the neuroscience approach to investigate the motor cortex in primates1, we develop a simple yet effective frequency-based approach called Policy Dissection to align the intermediate representation of the learned neural controller with the kinematic attributes of the agent behavior. Without modifying the neural controller or retraining the model, the proposed approach can convert a given RL-trained policy into a goal-conditioned policy, where specific units can be activated to evoke desired behaviors and complete goals. This, in turn, enables Human-AI shared control where human can control the trained AI and finish complex tasks.


Fig. 1 Identifying motor primitives from observational data

We first roll out the trained policy and record the neural activities and track kinematic attributes, like yaw and velocity. After frequency matching, kinematic attributes are associated with certain units, which are further called motor primitives. The curves of kinematic attributes and the aligned motor primitive are painted in the same colors. For clarity, we only show the result of one recorded episode and a proportion of units, and the curves of units are sorted by their amplitude.

A behavior can be described by changing a subset of kinematic attributes, which can be achieved by activating a set of corresponding motor primitives. Therefore, these movement generation building blocks, are associated with certain behaviors, yielding the stimulation-evoked map. Taking back-flip shown as an example, this behavior can be described by increasing 1. height 2. pitch and 3. knee force. Therefore, we can evoke this behavior by activating motor primitives related to the three kinematic attributes.

1. Exerting electrical stimulation on different areas of motor cortex can elicit meaningful body movements Graziano, Michael SA, et al. "The cortical control of movement revisited." Neuron 36.3 (2002): 349-362.

Case Study: Parkour

We trained bipedal robots, Cassie, in IsaacGym . Though this robot is trained to move forward only, activating identified primitives can evoke complex behaviors like crouching, forward jumping and back-flipping. Please refer to our paper for how to discover these skills.

In the following parkour video, we show that with human instruction, the robot can combine these skills and overcome complex situations. This neural controller with probed primitives and the shared control interface are all released here.

Ablation Study: Comparison with goal-conditioned controller

To quantify the coarseness of the goal-tracking controller empowered by Policy Dissection, an explicit goal-conditioned controller following a target yaw is trained in IsaacGym . We directly identify the primitives related to yaw rate in this controller with the proposed method, and employ a PID controller to determine the output of the neuron related to yaw rate. This enables a new way, neural primitive activation, to track target command besides explicitly indicating the goal in the network input.

Consequently, experiments can be conducted to fairly compare these two methods for quantifying the coarseness of Policy Dissection enabled goal-conditioned control. As shown in the video, the tracking precision of the goal-conditioned controller achieved by our method is compatible to the explict goal-conditioned control method.

Note: the explict target yaw command in the observation is set to 0 for the primitive activation command tracking.

Demo Video
We provide a demo video to show human-AI shared control systems empowered by Policy Dissection can be built on various tasks, including quadrupedal robot locomotion, autonomous driving and classic gym tasks.
Reference
If you find this work useful in your project, please consider to cite it through:
 @inproceedings{
    li2022humanai,
    title={Human-{AI} Shared Control via Policy Dissection},
    author={Quanyi Li and Zhenghao Peng and Haibin Wu and Lan Feng and Bolei Zhou},
    booktitle={Thirty-Sixth Conference on Neural Information Processing Systems},
    year={2022},
    url={https://openreview.net/forum?id=LCOv-GVVDkp}
 }