Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuliang Yang

WISE-TTT:Worldwide Information Segmentation Enhancement

Apr 01, 2025

Fenglei Hao, Yuliang Yang, Ruiyuan Su, Zhengran Zhao, Yukun Qiao, Mengyu Zhu

Abstract:Video multi-target segmentation remains a major challenge in long sequences, mainly due to the inherent limitations of existing architectures in capturing global temporal dependencies. We introduce WISE-TTT, a synergistic architecture integrating Test-Time Training (TTT) mechanisms with the Transformer architecture through co-design. The TTT layer systematically compresses historical temporal data to generate hidden states containing worldwide information(Lossless memory to maintain long contextual integrity), while achieving multi-stage contextual aggregation through splicing. Crucially, our framework provides the first empirical validation that implementing worldwide information across multiple network layers is essential for optimal dependency utilization.Ablation studies show TTT modules at high-level features boost global modeling. This translates to 3.1% accuracy improvement(J&F metric) on Davis2017 long-term benchmarks -- the first proof of hierarchical context superiority in video segmentation. We provide the first systematic evidence that worldwide information critically impacts segmentation performance.

Via

Access Paper or Ask Questions

MC-MLP:Multiple Coordinate Frames in all-MLP Architecture for Vision

Apr 08, 2023

Zhimin Zhu, Jianguo Zhao, Tong Mu, Yuliang Yang, Mengyu Zhu

Figure 1 for MC-MLP:Multiple Coordinate Frames in all-MLP Architecture for Vision

Figure 2 for MC-MLP:Multiple Coordinate Frames in all-MLP Architecture for Vision

Figure 3 for MC-MLP:Multiple Coordinate Frames in all-MLP Architecture for Vision

Figure 4 for MC-MLP:Multiple Coordinate Frames in all-MLP Architecture for Vision

Abstract:In deep learning, Multi-Layer Perceptrons (MLPs) have once again garnered attention from researchers. This paper introduces MC-MLP, a general MLP-like backbone for computer vision that is composed of a series of fully-connected (FC) layers. In MC-MLP, we propose that the same semantic information has varying levels of difficulty in learning, depending on the coordinate frame of features. To address this, we perform an orthogonal transform on the feature information, equivalent to changing the coordinate frame of features. Through this design, MC-MLP is equipped with multi-coordinate frame receptive fields and the ability to learn information across different coordinate frames. Experiments demonstrate that MC-MLP outperforms most MLPs in image classification tasks, achieving better performance at the same parameter level. The code will be available at: https://github.com/ZZM11/MC-MLP.

Via

Access Paper or Ask Questions