Deep learning applied to weather forecasting has started gaining popularity because of the progress achieved by data-driven models. The present paper compares four different deep learning architectures to perform weather prediction on daily data gathered from 18 cities across Europe and spanned over a period of 15 years. The four proposed models investigate the different type of input representations (i.e. tensorial unistream vs. multi-stream matrices) as well as the combination of convolutional neural networks and LSTM (i.e. cascaded vs. ConvLSTM). In particular, we show that a model that uses a multi-stream input representation and that processes each lag individually combined with a cascaded convolution and LSTM is capable of better forecasting than the other compared models. In addition, we show that visualization techniques such as occlusion analysis and score maximization can give an additional insight on the most important features and cities for predicting a particular target feature and city.