Large-scale ride-sharing systems combine real-time dispatching and routing optimization over a rolling time horizon with a model predictive control(MPC) component that relocates idle vehicles to anticipate the demand. The MPC optimization operates over a longer time horizon to compensate for the inherent myopic nature of the real-time dispatching. These longer time horizons are beneficial for the quality of the decisions but increase computational complexity. To address this computational challenge, this paper proposes a hybrid approach that combines machine learning and optimization. The machine-learning component learns the optimal solution to the MPC optimization on the aggregated level to overcome the sparsity and high-dimensionality of the MPC solutions. The optimization component transforms the machine-learning predictions back to the original granularity via a tractable transportation model. As a consequence, the original NP-hard MPC problem is reduced to a polynomial time prediction and optimization. Experimental results show that the hybrid approach achieves 27% further reduction in rider waiting time than the MPC optimization, thanks to its ability to model a longer time horizon within the computational limits.