Spatiotemporal prediction plays an important role in solving natural problems and processing video frames, especially in weather forecasting and human action recognition. Recent advances attempt to incorporate prior physical knowledge into the deep learning framework to estimate the unknown governing partial differential equations (PDEs), which have shown promising results in spatiotemporal prediction tasks. However, previous approaches only restrict neural network architectures or loss functions to acquire physical or PDE features, which decreases the representative capacity of a neural network. Meanwhile, the updating process of the physical state cannot be effectively estimated. To solve the above mentioned problems, this paper proposes a physical-guided neural network, which utilizes the frequency-enhanced Fourier module and moment loss to strengthen the model's ability to estimate the spatiotemporal dynamics. Furthermore, we propose an adaptive second-order Runge-Kutta method with physical constraints to model the physical states more precisely. We evaluate our model on both spatiotemporal and video prediction tasks. The experimental results show that our model outperforms state-of-the-art methods and performs best in several datasets, with a much smaller parameter count.