Multi-target tracking (MTT) is a traditional signal processing task, where the goal is to estimate the states of an unknown number of moving targets from noisy sensor measurements. In this paper, we revisit MTT from a deep learning perspective and propose convolutional neural network (CNN) architectures to tackle it. We represent the target states and sensor measurements as images. Thereby we recast the problem as a image-to-image prediction task for which we train a fully convolutional model. This architecture is motivated by a novel theoretical bound on the transferability error of CNN. The proposed CNN architecture outperforms a GM-PHD filter on the MTT task with 10 targets. The CNN performance transfers without re-training to a larger MTT task with 250 targets with only a $13\%$ increase in average OSPA.