Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

X. Feng

CSTrack: Enhancing RGB-X Tracking via Compact Spatiotemporal Features

May 26, 2025

X. Feng, D. Zhang, S. Hu, X. Li, M. Wu, J. Zhang, X. Chen, K. Huang

Abstract:Effectively modeling and utilizing spatiotemporal features from RGB and other modalities (\eg, depth, thermal, and event data, denoted as X) is the core of RGB-X tracker design. Existing methods often employ two parallel branches to separately process the RGB and X input streams, requiring the model to simultaneously handle two dispersed feature spaces, which complicates both the model structure and computation process. More critically, intra-modality spatial modeling within each dispersed space incurs substantial computational overhead, limiting resources for inter-modality spatial modeling and temporal modeling. To address this, we propose a novel tracker, CSTrack, which focuses on modeling Compact Spatiotemporal features to achieve simple yet effective tracking. Specifically, we first introduce an innovative Spatial Compact Module that integrates the RGB-X dual input streams into a compact spatial feature, enabling thorough intra- and inter-modality spatial modeling. Additionally, we design an efficient Temporal Compact Module that compactly represents temporal features by constructing the refined target distribution heatmap. Extensive experiments validate the effectiveness of our compact spatiotemporal modeling method, with CSTrack achieving new SOTA results on mainstream RGB-X benchmarks. The code and models will be released at: https://github.com/XiaokunFeng/CSTrack.

* Accepted by ICML25!

Via

Access Paper or Ask Questions

Enhancing Vision-Language Tracking by Effectively Converting Textual Cues into Visual Cues

Dec 27, 2024

X. Feng, D. Zhang, S. Hu, X. Li, M. Wu, J. Zhang, X. Chen, K. Huang

Abstract:Vision-Language Tracking (VLT) aims to localize a target in video sequences using a visual template and language description. While textual cues enhance tracking potential, current datasets typically contain much more image data than text, limiting the ability of VLT methods to align the two modalities effectively. To address this imbalance, we propose a novel plug-and-play method named CTVLT that leverages the strong text-image alignment capabilities of foundation grounding models. CTVLT converts textual cues into interpretable visual heatmaps, which are easier for trackers to process. Specifically, we design a textual cue mapping module that transforms textual cues into target distribution heatmaps, visually representing the location described by the text. Additionally, the heatmap guidance module fuses these heatmaps with the search image to guide tracking more effectively. Extensive experiments on mainstream benchmarks demonstrate the effectiveness of our approach, achieving state-of-the-art performance and validating the utility of our method for enhanced VLT.

* Accepted by ICASSP '25 ! Code: https://github.com/XiaokunFeng/CTVLT

Via

Access Paper or Ask Questions

Detecting intracranial aneurysm rupture from 3D surfaces using a novel GraphNet approach

Oct 17, 2019

Z. Ma, L. Song, X. Feng, G. Yang, W. Zhu, J. Liu, Y. Zhang, X. Yang, Y. Yin

Figure 1 for Detecting intracranial aneurysm rupture from 3D surfaces using a novel GraphNet approach

Figure 2 for Detecting intracranial aneurysm rupture from 3D surfaces using a novel GraphNet approach

Figure 3 for Detecting intracranial aneurysm rupture from 3D surfaces using a novel GraphNet approach

Abstract:Intracranial aneurysm (IA) is a life-threatening blood spot in human's brain if it ruptures and causes cerebral hemorrhage. It is challenging to detect whether an IA has ruptured from medical images. In this paper, we propose a novel graph based neural network named GraphNet to detect IA rupture from 3D surface data. GraphNet is based on graph convolution network (GCN) and is designed for graph-level classification and node-level segmentation. The network uses GCN blocks to extract surface local features and pools to global features. 1250 patient data including 385 ruptured and 865 unruptured IAs were collected from clinic for experiments. The performance on randomly selected 234 test patient data was reported. The experiment with the proposed GraphNet achieved accuracy of 0.82, area-under-curve (AUC) of receiver operating characteristic (ROC) curve 0.82 in the classification task, significantly outperforming the baseline approach without using graph based networks. The segmentation output of the model achieved mean graph-node-based dice coefficient (DSC) score 0.88.

* Submitted to ISBI 2020

Via

Access Paper or Ask Questions