Urbanisation is a great challenge for modern societies, promising better access to economic opportunities while widening socioeconomic inequalities. Accurately tracking how this process unfolds has been challenging for traditional data collection methods, while remote sensing information offers an alternative to gather a more complete view on these societal changes. By feeding a neural network with satellite images one may recover the socioeconomic information associated to that area, however these models lack to explain how visual features contained in a sample, trigger a given prediction. Here we close this gap by predicting socioeconomic status across France from aerial images and interpreting class activation mappings in terms of urban topology. We show that the model disregards the spatial correlations existing between urban class and socioeconomic status to derive its predictions. These results pave the way to build interpretable models, which may help to better track and understand urbanisation and its consequences.