NeurIPS2024
UrbanDataLayer: A Unified Data Pipeline for Urban Science
Yiheng Wang, Tianyu Wang, YuYing Zhang, Hongji Zhang, Haoyu Zheng, Guanjie Zheng, Linghe Kong
Abstract
The rapid progression of urbanization has generated a diverse array of urban 1 data, facilitating significant advancements in urban science and urban computing. 2 Current studies often work on separate problems case by case using diverse data, 3 e.g., air quality prediction, and built-up areas classification. This fragmented 4 approach hinders the urban research field from advancing at the pace observed in 5 Computer Vision and Natural Language Processing, due to two primary reasons. 6 On the one hand, the diverse data processing steps lead to the lack of large-scale 7 benchmarks and therefore decelerate iterative methodology improvement on a 8 single problem. On the other hand, the disparity in multi-modal data formats 9 hinders the combination of the related modal data to stimulate more research 10 findings. To address these challenges, we propose UrbanDataLayer (UDL), a suite 11 of standardized data structures and pipelines for city data engineering, providing a 12 unified data format for researchers. This allows researchers to easily build up large-13 scale benchmarks and combine multi-modal data, thus expediting the development 14 of multi-modal urban foundation models. To verify the effectiveness of our work, 15 we present four distinct urban problem tasks utilizing the proposed data layer. 16 UrbanDataLayer aims to enhance standardization and operational efficiency within 17 the urban science research community. The examples and source code are available 18 at https://github.com/SJTU-CILAB/udl . 19