Abstract:Why do some streets attract more social activities than others? Is it due to street design, or do land use patterns in neighborhoods create opportunities for businesses where people gather? These questions have intrigued urban sociologists, designers, and planners for decades. Yet, most research in this area has remained limited in scale, lacking a comprehensive perspective on the various factors influencing social interactions in urban settings. Exploring these issues requires fine-level data on the frequency and variety of social interactions on urban street. Recent advances in computer vision and the emergence of the open-vocabulary detection models offer a unique opportunity to address this long-standing issue on a scale that was previously impossible using traditional observational methods. In this paper, we propose a new benchmark dataset for Evaluating Localization of Social Activities (ELSA) in urban street images. ELSA draws on theoretical frameworks in urban sociology and design. While majority of action recognition datasets are collected in controlled settings, we use in-the-wild street-level imagery, where the size of social groups and the types of activities can vary significantly. ELSA includes 937 manually annotated images with more than 4,300 multi-labeled bounding boxes for individual and group activities, categorized into three primary groups: Condition, State, and Action. Each category contains various sub-categories, e.g., alone or group under Condition category, standing or walking, which fall under the State category, and talking or dining with regards to the Action category. ELSA is publicly available for the research community.
Abstract:There is a lack of data on the location, condition, and accessibility of sidewalks across the world, which not only impacts where and how people travel but also fundamentally limits interactive mapping tools and urban analytics. In this paper, we describe initial work in semi-automatically building a sidewalk network topology from satellite imagery using hierarchical multi-scale attention models, inferring surface materials from street-level images using active learning-based semantic segmentation, and assessing sidewalk condition and accessibility features using Crowd+AI. We close with a call to create a database of labeled satellite and streetscape scenes for sidewalks and sidewalk accessibility issues along with standardized benchmarks.