Towards Autonomous Micromobility through Scalable Urban Simulation
CVPR 2025 Hightlight
Wayne Wu* , Honglin He* , Chaoyuan Zhang , Jack He , Seth Z. Zhao , Ran Gong , Quanyi Li , Bolei Zhou
University of California, Los Angeles
[Webpage is still updating!]
-
TL;DR: A solution to support scalable robot learning in urban spaces.
- URBAN-SIM -- a large-scale robot learning platform for reinforcement and imitation learning.
- URBAN-BENCH -- a holistic benchmark for foundational tasks like urban locomotion and navigation and comprehensive tasks like urban traversal.
It consists of two components:

What is Micromobility?
Micromobility is a promising urban transport way for short-distance travel.
The figure below shows different alternates to car ownership categorized by trip length, where we focus on the short distance trips with micro-mobility devices.

What is a Micromobility Device?
Micromobility Device encompasses two forms of machines:
- Mobile robots: wheeled, quadruped, wheeled-legged, humanoid robots, etc.
- Assistive mobility devices: electric wheelchairs, mobility scooters, intelligent scooter, autonomous bicycles, etc.

Why URBAN-SIM?
Previous simulation platforms can be categorized into two classes:
- Robot learning platforms (e.g., IsaacGym and IsaacLab): ✅ High Performance ❌ Rich Scene Context
- Autonomous driving platforms (e.g., CARLA and MetaDrive): ❌ High Performance ✅ Rich Scene Context
URBAN-SIM can achieve both two characteristics that are critical for robot learning in urban spaces:
- ✅ High Performance: up to 2,600 fps on a single GPU - > support high-efficient RL training.
- ✅ Rich Scene Context: infinite urban scene generation - > support scene-aware tasks such as visual locomotion, navigation and Vision-Language Action (VLA) model training, and broad interaction tasks between robot, human and scene.
In a nutshell, URBAN-SIM can make robot learning scalable: robots can be trained on an infinite number of diverse scenes with any number of GPUs.
What is URBAN-SIM?
URBAN-SIM is a high-performance robot learning platform for autonomous micromobility. It can automatically construct infinite diverse and realistic interactive urban scenes for large-scale robot learning while providing up to 2,600 fps high simulation performance with large-scale parallelization in a single Nvidia L40S GPU.
Robot Training on Different Scales of Generated Scenes
Why URBAN-BENCH?
Existing AI solutions often focus on isolated robot skills — like obstacle avoidance, goal reaching, or parkour. However, real-world urban tasks demand versatile and integrated capabilities for end-to-end operation.
Take, for example, a robot dog navigating from a coffee shop to a campus building. It must climb up and down curbs and stairs, avoid collisions with pedestrians and sidewalk clutter, and traverse narrow or crowded spaces — sometimes requiring human teleoperation in risky scenarios. All these skills must work together to ensure one successful journey.
To support research in this emerging domain of autonomous micromobility, and to standardize evaluation practices, we introduce URBAN-BENCH. It spans both foundational tasks -- urban locomotion and navigation, and the comprehensive challenge of kilometer-scale urban traversal, all set in diverse and realistic city environments.
What is URBAN-BENCH?
URBAN-BENCH is a suite of essential tasks and benchmarks to train and evaluate different capabilities of robots. We propose three tasks across eight scenarios on four robots - including wheeled robot (COCO Robotics' delivery robot), a quadruped robot (Unitree Go2), a wheeled-legged robot (Unitree B2-W), and a humanoid robot (Unitree G1).

How is the Scalability
Experiments demonstrates URBAN-SIM has strong scalability.
- Left: as the number of environments increases from 1 to 256, FPS scales significantly from 100 to 2,620 fps. GPU memory usage grows only slightly, with 256 environments occupying just 11.2 GB of the available 46 GB memory.
- Right: As the number of training scenes increases from 1 to 1,024, the performance remarkably improves from 5.1% to 83.2% in success rate.

Acknowledgement
The project was supported by the NSF grants CNS-2235012 and IIS-2339769, and ONR grant N000142512166. We extend our gratitude for the excellent assets, including 3D objects from Objaverse-XL, 3D humans from SynBody, and robots from IsaacLab.
Citation
@ inproceedings{wu2025urbansim,
title={Towards Autonomous Micromobility through Scalable Urban Simulation},
author={Wu, Wayne and He, Honglin and Zhang, Chaoyuan and He, Jack and Zhao, Seth Z. and Gong, Ran and Li, Quanyi and Zhou, Bolei},
booktitle ={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},
year={2025}
}