Data-driven approaches for modelling contact-rich tasks address many of the difficulties that analytical models bear. For real-world scenarios, the hardware capabilities constrain the available measurements and consequently, every step of the problem's formulation. In this work, we propose a formulation that encapsulates knowledge from a baseline controller for the contact-rich task of food-cutting. Based on this formulation, we employ deep networks to model the dynamics within a model predictive controller. We design a training process based on curriculum training with learning rate decay for multi-step predictions, which are essential for receding horizon control. Experimental results demonstrate that even with a simple architecture, our model achieves consistently good predictive performance on known and unknown object classes and exhibits a good understanding of the long term dynamics.