Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zhifeng Ma

MS-LSTM: Exploring Spatiotemporal Multiscale Representations in Video Prediction Domain

Apr 22, 2023

Zhifeng Ma, Hao Zhang, Jie Liu

Abstract:The drastic variation of motion in spatial and temporal dimensions makes the video prediction task extremely challenging. Existing RNN models obtain higher performance by deepening or widening the model. They obtain the multi-scale features of the video only by stacking layers, which is inefficient and brings unbearable training costs (such as memory, FLOPs, and training time). Different from them, this paper proposes a spatiotemporal multi-scale model called MS-LSTM wholly from a multi-scale perspective. On the basis of stacked layers, MS-LSTM incorporates two additional efficient multi-scale designs to fully capture spatiotemporal context information. Concretely, we employ LSTMs with mirrored pyramid structures to construct spatial multi-scale representations and LSTMs with different convolution kernels to construct temporal multi-scale representations. Detailed comparison experiments with eight baseline models on four video datasets show that MS-LSTM has better performance but lower training costs.

* arXiv admin note: substantial text overlap with arXiv:2206.03010

Via

Access Paper or Ask Questions

MS-RNN: A Flexible Multi-Scale Framework for Spatiotemporal Predictive Learning

Jun 07, 2022

Zhifeng Ma, Hao Zhang, Jie Liu

Figure 1 for MS-RNN: A Flexible Multi-Scale Framework for Spatiotemporal Predictive Learning

Figure 2 for MS-RNN: A Flexible Multi-Scale Framework for Spatiotemporal Predictive Learning

Figure 3 for MS-RNN: A Flexible Multi-Scale Framework for Spatiotemporal Predictive Learning

Figure 4 for MS-RNN: A Flexible Multi-Scale Framework for Spatiotemporal Predictive Learning

Abstract:Spatiotemporal predictive learning is to predict future frames changes through historical prior knowledge. Previous work improves prediction performance by making the network wider and deeper, but this also brings huge memory overhead, which seriously hinders the development and application of the technology. Scale is another dimension to improve model performance in common computer vision task, which can decrease the computing requirements and better sense of context. Such an important improvement point has not been considered and explored by recent RNN models. In this paper, learning from the benefit of multi-scale, we propose a general framework named Multi-Scale RNN (MS-RNN) to boost recent RNN models. We verify the MS-RNN framework by exhaustive experiments on 4 different datasets (Moving MNIST, KTH, TaxiBJ, and HKO-7) and multiple popular RNN models (ConvLSTM, TrajGRU, PredRNN, PredRNN++, MIM, and MotionRNN). The results show the efficiency that the RNN models incorporating our framework have much lower memory cost but better performance than before. Our code is released at \url{https://github.com/mazhf/MS-RNN}.

Via

Access Paper or Ask Questions