MetaUrban: An Embodied AI Simulation Platform for Urban Micromobility

Wayne Wu , Honglin He , Jack He , Yiran Wang , Chenda Duan , Zhizheng Liu , Quanyi Li , Bolei Zhou
University of California, Los Angeles
| Code | Paper

TL;DR

    MetaUrban is a compositional simulation platform for AI-driven urban micromobility research. It will be open-source to enable more research opportunities for the community, and foster generalizable and safe embodied AI and micromobility in cities.

Introducing MetaUrban

Download Video from Google Drive | Baidu Netdisk

Abstract

Public urban spaces like streetscapes and plazas serve residents and accommodate social life in all its vibrant variations. Recent advances in Robotics and Embodied AI make public urban spaces no longer exclusive to humans. Food delivery bots and electric wheelchairs have started sharing sidewalks with pedestrians, while robot dogs and humanoids have recently emerged in the street. Micromobility enabled by AI for short-distance travel in public urban spaces plays a crucial component in the future transportation system. Ensuring the generalizability and safety of AI models maneuvering mobile machines is essential. In this work, we present MetaUrban, a compositional simulation platform for AI-driven urban micromobility research. MetaUrban can construct an infinite number of interactive urban scenes from compositional elements, covering a vast array of ground plans, object placements, pedestrians, vulnerable road users, and other mobile agents' appearances and dynamics. We design point navigation and social navigation tasks as the pilot study using MetaUrban for urban micromobility research and establish various baselines of Reinforcement Learning and Imitation Learning. We conduct extensive evaluation across mobile machines, demonstrating that heterogeneous mechanical structures significantly influence the learning and execution of AI policies. We perform a thorough ablation study, showing that the compositional nature of the simulated environments can substantially improve the generalizability and safety of the trained mobile agents. MetaUrban will be made publicly available to provide research opportunities and foster safe and trustworthy embodied AI and micromobility in cities. The code and dataset are released.

Procedural Generation Pipeline

MetaUrban can automatically generate complex urban scenes with its compositional nature. MetaUrban uses a structured description script to create urban scenes. Based on the provided information about street blocks, sidewalks, objects, agents, and more, it starts with the street block map, then plans the ground layout by dividing different function zones, then places static objects, and finally populates dynamic agents. In the Figure, the first column is the structured description script. From the second to the fourth column, the top rows show the 2D road maps, and the bottom rows show the bird-eye view of 3D scenes in the simulator.

Urban Scene Gallery

Parade of Dynamic Agents

Sensors

Benchmarks

We design two common tasks in urban scenes as the pilot study: Point Navigation (PointNav) and Social Navigation (Saccess to a pre-built environment map. In SocialNav, the agent is required to reach a point goal in dynamic environments that contain moving environmental agents. The agent shall avoid collisions or proximity to environmental agents beyond thresholds to avoid penalization (distance <0.2 meters). The agent is evaluated using the Success Rate (SR) and Success weighted by Path Length (SPL) metrics, which measure the success and efficiency of the path taken by the agent. For SocialNav, except Success Rate (SR), the Social Navigation Score (SNS), is also used to evaluate the social complicity of the agent. For both tasks, we further report the Cumulative Cost (CC) to evaluate the safety properties of the agent. It records the crash frequency to obstacles or environmental agents. We evaluate 7 typical baseline models to build comprehensive benchmarks on MetaUrban, across Reinforcement Learning (PPO), Safe Reinforcement Learning (PPO-LagocialNav). In PointNav, the agent’s goal is to navigate to the target coordinates in static environments without, and PPO-ET), Offline Reinforcement Learning (IQL and TD3+BC), and Imitation Learning (BC and GAIL).

Results

The results of a PPO policy trained in MetaUrban environments on the social navigation task. We demonstrate success cases that can avoid collision with objects and other agents. However, there are still many interesting failure cases, which indicate the complexity of MetaUrban environments and the significant room for improvement of embodied agents in urban spaces.

Success Cases



Failure Cases

Terrains and Materials

Traffic Rules

Interface for Demonstration Data Collection

Interface for Human-in-the-loop Learning

Impacts

Embodied AI. MetaUrban contributes to advancing areas such as robot navigation, social robotics, and interactive systems. It could facilitate the development of robust AI systems capable of understanding and navigating complex urban environments.
Economy. MetaUrban could be used in businesses and services operating in urban environments, such as last-mile food delivery, assistive wheelchairs, and trash-cleaning robots. It could also drive innovation in urban planning and infrastructure development by providing simulation tools and insights into how spaces are utilized, thereby enhancing the economic and societal efficiency of public urban spaces like sidewalks and parks.
Society. By enabling the safe integration of robots and AI systems in public spaces, MetaUrban could support the development of assistive technologies that can aid in accessibility and public services. Using AI in public spaces might foster new forms of social interaction and community services, making urban spaces more livable and joyful.

Acknowledgement

The project is supported by the NSF Grants CCRI-2235012, RI-2339769, and POSE-2346267, and Intel Rising Star Faculty Award.

Reference

@article{wu2024metaurban,
  title={MetaUrban: A Simulation Platform for Embodied AI in Urban Spaces},
  author={Wu, Wayne and He, Honglin and Wang, Yiran and Duan, Chenda and He, Jack and Liu, Zhizheng and Li, Quanyi and Zhou, Bolei},
  journal={arXiv preprint arXiv:2407.08725},
  year={2024}
}