Urban Scene Diffusion through Semantic Occupancy Map
Junge Zhang 1,5 , Qihang Zhang 2 , Li Zhang 3 , Ramana Rao Kompella 4 , Gaowen Liu 4 , Jiachen Li 1 , Bolei Zhou 5
1 University of California, Riverside , 2 The Chinese University of Hong Kong , 3 Fudan University , 4 Cisco , 5 University of California, Los Angeles
Overview
This work presents UrbanDiffusion, a novel 3D diffusion model for generating large-scale urban scenes from Bird’s-Eye View (BEV) maps. The model innovatively incorporates both the geometry and semantics of urban structures and objects, extending beyond mere visual representation. It learns the data distribution of scene-level structures within a latent space, thereby facilitating the generation of diverse urban scenes of any scale. Trained on a real-world driving dataset, this model is capable of generating scenes from both held-out BEV maps and synthesized maps from a driving simulator. Furthermore, this work illustrates its applicability in scene image synthesis using a pretrained image generator.
Method
In order to train the diffusion model in a fast and memory efficient way, we choose to follow the latent diffusion model to learn the latent distribution of the 3D data. We thus first embed the 3D semantic data in the space with a lower dimension and then conduct a classifier-free guidance for the diffusion process in this latent feature space. Given a BEV layout, the trained model can generate diverse and realistic samples that contain the scene geometry and semantic information through sampling process:
Scene Generation
Condition on single frame BEV map
We demonstare the generated samples on BEV maps from different datasets, including nuScenes Validation set, Waymo Motion Dataset, nuPlan Dataset and Metadrive Procedural Generation Map.
nuScenes Validation set
nuPlan Dataset
Metadrive Procedural Generation Map
Large-scale Scene Generation
Scene Synthesis
Synthesis on scenes from different dataset
We perform scene synthesis on different scenes which are sampled from diverse BEV maps:
Reference
@article{urbandiff,
title={Urban Scene Diffusion through Semantic Occupancy Map},
author={Junge Zhang and Qihang Zhang and Li Zhang and Ramana Rao Kompella and Gaowen Liu and Jiachen Li and Bolei Zhou},
journal={arXiv preprint arXiv:2403.11697},
year={2024}
}
Acknowledgement
This work was supported by the Cisco Faculty Award.