Precise and timely traffic flow prediction plays a critical role in developing intelligent transportation systems and has attracted considerable attention in recent decades. Despite the significant progress in this area brought by deep learning, challenges remain. Traffic flows usually change dramatically in a short period, which prevents the current methods from accurately capturing the future trend and likely causes the over-fitting problem, leading to unsatisfied accuracy. To this end, this paper proposes a Long Short-Term Memory (LSTM) based method that can forecast the short-term traffic flow precisely and avoid local optimum problems during training. Specifically, instead of using the non-stationary raw traffic data directly, we first decompose them into sub-components, where each one is less noisy than the original input. Afterward, Sample Entropy (SE) is employed to merge similar components to reduce the computation cost. The merged features are fed into the LSTM, and we then introduce a spatiotemporal module to consider the neighboring relationships in the recombined signals to avoid strong autocorrelation. During training, we utilize the Grey Wolf Algorithm (GWO) to optimize the parameters of LSTM, which overcome the overfitting issue. We conduct the experiments on a UK public highway traffic flow dataset, and the results show that the proposed method performs favorably against other state-of-the-art methods with better adaption performance on extreme outliers, delay effects, and trend-changing responses.