Learning to Drive by Watching YouTube videos:
Action-Conditioned Contrastive Policy Pretraining
Qihang Zhang1,   Zhenghao Peng1Bolei Zhou2 
1The Chinese University of Hong Kong, 2University of California, Los Angeles
Paper at ECCV'22 | Code | Dataset


This work develops a novel action-conditioned policy pretraining method called ACO by learning from driving videos on the web. It learns to capture important features in the neural representation relevant to the decision-making and benefits various downstream tasks.

The method consists the following steps:

  1. We first collect a large corpus of driving videos with a wide range of weather conditions, from wet to sunny, from all across the world.
  2. We then train a inverse dynamics model with a small amount of labeled data and use it to generate action pseudo labels for each collected frame.
  3. We then develop a contrastive learning that incorporates action pseudo labels for representation learning.
The learned representations show a strong performance on downstream driving tasks, outperforming ImageNet-pretrained weights and other self-supervised representations.

You can download the YouTube driving dataset following this link. The list of YouTube video can be download here.


Learning to Drive by Watching YouTube Videos: Action-Conditioned Contrastive Policy Pretraining.
Qihang Zhang, Zhenghao Peng, and Bolei Zhou.
European Conference on Computer Vision (ECCV), 2022

  title={Learning to Drive by Watching YouTube videos: Action-Conditioned Contrastive Policy Pretraining},
  author={Zhang, Qihang and Peng, Zhenghao and Zhou, Bolei},
  journal={European Conference on Computer Vision (ECCV)},