Reduced-order models, also known as proxy model or surrogate model, are approximate models that are less computational expensive as opposed to fully descriptive models. With the integration of machine learning, these models have garnered increasing research interests recently. However, many existing reduced-order modeling methods, such as embed to control (E2C) and embed to control and observe (E2CO), fall short in long-term predictions due to the accumulation of prediction errors over time. This issue arises partly from the one-step prediction framework inherent in E2C and E2CO architectures. This paper introduces a deep learning-based surrogate model, referred as multi-step embed-to-control model, for the construction of proxy models with improved long-term prediction performance. Unlike E2C and E2CO, the proposed network considers multiple forward transitions in the latent space at a time using Koopman operator, allowing the model to incorporate a sequence of state snapshots during training phrases. Additionally, the loss function of this novel approach has been redesigned to accommodate these multiple transitions and to respect the underlying physical principles. To validate the efficacy of the proposed method, the developed framework was implemented within two-phase (oil and water) reservoir model under a waterflooding scheme. Comparative analysis demonstrate that the proposed model significantly outperforms the conventional E2C model in long-term simulation scenarios. Notably, there was a substantial reduction in temporal errors in the prediction of saturation profiles and a decent improvement in pressure forecasting accuracy.