The goal of the IARAI competition traffic4cast was to predict the city-wide traffic status within a 15-minute time window, based on information from the previous hour. The traffic status was given as multi-channel images (one pixel roughly corresponds to 100x100 meters), where one channel indicated the traffic volume, another one the average speed of vehicles, and a third one their rough heading. As part of our work on the competition, we evaluated many different network architectures, analyzed the statistical properties of the given data in detail, and thought about how to transform the problem to be able to take additional spatio-temporal context-information into account, such as the street network, the positions of traffic lights, or the weather. This document summarizes our efforts that led to our best submission, and gives some insights about which other approaches we evaluated, and why they did not work as well as imagined.