WWW2025

Perceiving Urban Inequality from Imagery Using Visual Language Models with Chain-of-Thought Reasoning

Yunke Zhang, Ruolong Ma, Xin Zhang, Yong Li

3 citations

Abstract

The rapid pace of urbanization has led to unequal benefits for residents, creating significant inequality issues and discussions around Sustainable Development Goals 10 and 11. Accurate measurement of inequality within urban areas is essential for effective mitigation strategies. Traditional methods rely on survey-based census data, which are time-consuming and delayed, while some studies use coarse proxies like nighttime lights. However, these methods are limited by resolution and fail to capture fine-grained disparities within communities. To address this, we aim to leverage accessible urban imagery, which offers detailed visual features. Two key challenges must be addressed: 1) accurately perceiving micro-level inequalities within neighborhoods, and 2) ensuring that this perception is interpretable for policy guidance. To address these gaps, we propose UI-CoT, a framework that leverages the power of urban imagery-based visual language models in urban inequality perceiving, enhanced by Chain-of-Thought prompting to improve reasoning capabilities. We fine-tune a visual language model to predict three essential neighborhood inequality indicators: the income Gini coefficient, dominant race, and racial income ratio. Extensive experiments show that our model can effectively perceive micro-level inequalities, with the incorporation of Chain-of-Thought reasoning further improving the model's performance by 17.2%. This research offers valuable insights into addressing inequalities within urban environments and demonstrates the potential of web resources in empowering urban sustainable development. The code and data are available at https://github.com/tsinghua-fib-lab/UI-CoT.