Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Hierarchical Separable Video Transformer for Snapshot Compressive Imaging

Jul 16, 2024

Ping Wang, Yulun Zhang, Lishun Wang, Xin Yuan

Figure 1 for Hierarchical Separable Video Transformer for Snapshot Compressive Imaging

Figure 2 for Hierarchical Separable Video Transformer for Snapshot Compressive Imaging

Figure 3 for Hierarchical Separable Video Transformer for Snapshot Compressive Imaging

Figure 4 for Hierarchical Separable Video Transformer for Snapshot Compressive Imaging

Share this with someone who'll enjoy it:

Abstract:Transformers have achieved the state-of-the-art performance on solving the inverse problem of Snapshot Compressive Imaging (SCI) for video, whose ill-posedness is rooted in the mixed degradation of spatial masking and temporal aliasing. However, previous Transformers lack an insight into the degradation and thus have limited performance and efficiency. In this work, we tailor an efficient reconstruction architecture without temporal aggregation in early layers and Hierarchical Separable Video Transformer (HiSViT) as building block. HiSViT is built by multiple groups of Cross-Scale Separable Multi-head Self-Attention (CSS-MSA) and Gated Self-Modulated Feed-Forward Network (GSM-FFN) with dense connections, each of which is conducted within a separate channel portions at a different scale, for multi-scale interactions and long-range modeling. By separating spatial operations from temporal ones, CSS-MSA introduces an inductive bias of paying more attention within frames instead of between frames while saving computational overheads. GSM-FFN is design to enhance the locality via gated mechanism and factorized spatial-temporal convolutions. Extensive experiments demonstrate that our method outperforms previous methods by $>\!0.5$ dB with comparable or fewer complexity and parameters. The source codes and pretrained models are released at https://github.com/pwangcs/HiSViT.

* Accepted by ECCV 2024

View paper on

Share this with someone who'll enjoy it:

Title:Hierarchical Separable Video Transformer for Snapshot Compressive Imaging

Paper and Code