With the continuous trend of data explosion, delivering packets from data servers to end users causes increased stress on both the fronthaul and backhaul traffic of mobile networks. To mitigate this problem, caching popular content closer to the end-users has emerged as an effective method for reducing network congestion and improving user experience. To find the optimal locations for content caching, many conventional approaches construct various mixed integer linear programming (MILP) models. However, such methods may fail to support online decision making due to the inherent curse of dimensionality. In this paper, a novel framework for proactive caching is proposed. This framework merges model-based optimization with data-driven techniques by transforming an optimization problem into a grayscale image. For parallel training and simple design purposes, the proposed MILP model is first decomposed into a number of sub-problems and, then, convolutional neural networks (CNNs) are trained to predict content caching locations of these sub-problems. Furthermore, since the MILP model decomposition neglects the internal effects among sub-problems, the CNNs' outputs have the risk to be infeasible solutions. Therefore, two algorithms are provided: the first uses predictions from CNNs as an extra constraint to reduce the number of decision variables; the second employs CNNs' outputs to accelerate local search. Numerical results show that the proposed scheme can reduce 71.6% computation time with only 0.8% additional performance cost compared to the MILP solution, which provides high quality decision making in real-time.