Abstract:As unconventional sources of geo-information, massive imagery and text messages from open platforms and social media form a temporally quasi-seamless, spatially multi-perspective stream, but with unknown and diverse quality. Due to its complementarity to remote sensing data, geo-information from these sources offers promising perspectives, but harvesting is not trivial due to its data characteristics. In this article, we address key aspects in the field, including data availability, analysis-ready data preparation and data management, geo-information extraction from social media text messages and images, and the fusion of social media and remote sensing data. We then showcase some exemplary geographic applications. In addition, we present the first extensive discussion of ethical considerations of social media data in the context of geo-information harvesting and geographic applications. With this effort, we wish to stimulate curiosity and lay the groundwork for researchers who intend to explore social media data for geo-applications. We encourage the community to join forces by sharing their code and data.
Abstract:Obtaining a dynamic population distribution is key to many decision-making processes such as urban planning, disaster management and most importantly helping the government to better allocate socio-technical supply. For the aspiration of these objectives, good population data is essential. The traditional method of collecting population data through the census is expensive and tedious. In recent years, machine learning methods have been developed to estimate the population distribution. Most of the methods use data sets that are either developed on a small scale or not publicly available yet. Thus, the development and evaluation of the new methods become challenging. We fill this gap by providing a comprehensive data set for population estimation in 98 European cities. The data set comprises digital elevation model, local climate zone, land use classifications, nighttime lights in combination with multi-spectral Sentinel-2 imagery, and data from the Open Street Map initiative. We anticipate that it would be a valuable addition to the research community for the development of sophisticated machine learning-based approaches in the field of population estimation.
Abstract:Urban land use on a building instance level is crucial geo-information for many applications, yet difficult to obtain. An intuitive approach to close this gap is predicting building functions from ground level imagery. Social media image platforms contain billions of images, with a large variety of motifs including but not limited to street perspectives. To cope with this issue this study proposes a filtering pipeline to yield high quality, ground level imagery from large social media image datasets. The pipeline ensures that all resulting images have full and valid geotags with a compass direction to relate image content and spatial objects from maps. We analyze our method on a culturally diverse social media dataset from Flickr with more than 28 million images from 42 cities around the world. The obtained dataset is then evaluated in a context of 3-classes building function classification task. The three building classes that are considered in this study are: commercial, residential, and other. Fine-tuned state-of-the-art architectures yield F1-scores of up to 0.51 on the filtered images. Our analysis shows that the performance is highly limited by the quality of the labels obtained from OpenStreetMap, as the metrics increase by 0.2 if only human validated labels are considered. Therefore, we consider these labels to be weak and publish the resulting images from our pipeline together with the buildings they are showing as a weakly labeled dataset.