Picture for Guolei Sun

Guolei Sun

equal contribution

Breaking Modality Heterogeneity in Low-Bit Quantization for Large Vision-Language Models

Add code
May 19, 2026
Viaarxiv icon

Video Understanding: From Geometry and Semantics to Unified Models

Add code
Mar 18, 2026
Viaarxiv icon

EgoSound: Benchmarking Sound Understanding in Egocentric Videos

Add code
Feb 15, 2026
Viaarxiv icon

DINO-Mix: Distilling Foundational Knowledge with Cross-Domain CutMix for Semi-supervised Class-imbalanced Medical Image Segmentation

Add code
Feb 08, 2026
Viaarxiv icon

Revisiting Adaptive Rounding with Vectorized Reparameterization for LLM Quantization

Add code
Feb 02, 2026
Viaarxiv icon

HiM2SAM: Enhancing SAM2 with Hierarchical Motion Estimation and Memory Optimization towards Long-term Tracking

Add code
Jul 10, 2025
Figure 1 for HiM2SAM: Enhancing SAM2 with Hierarchical Motion Estimation and Memory Optimization towards Long-term Tracking
Figure 2 for HiM2SAM: Enhancing SAM2 with Hierarchical Motion Estimation and Memory Optimization towards Long-term Tracking
Figure 3 for HiM2SAM: Enhancing SAM2 with Hierarchical Motion Estimation and Memory Optimization towards Long-term Tracking
Figure 4 for HiM2SAM: Enhancing SAM2 with Hierarchical Motion Estimation and Memory Optimization towards Long-term Tracking
Viaarxiv icon

A Comprehensive Survey on Video Scene Parsing:Advances, Challenges, and Prospects

Add code
Jun 16, 2025
Figure 1 for A Comprehensive Survey on Video Scene Parsing:Advances, Challenges, and Prospects
Figure 2 for A Comprehensive Survey on Video Scene Parsing:Advances, Challenges, and Prospects
Figure 3 for A Comprehensive Survey on Video Scene Parsing:Advances, Challenges, and Prospects
Figure 4 for A Comprehensive Survey on Video Scene Parsing:Advances, Challenges, and Prospects
Viaarxiv icon

Exploiting Temporal State Space Sharing for Video Semantic Segmentation

Add code
Mar 26, 2025
Viaarxiv icon

CamSAM2: Segment Anything Accurately in Camouflaged Videos

Add code
Mar 26, 2025
Viaarxiv icon

Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model

Add code
Mar 20, 2025
Figure 1 for Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model
Figure 2 for Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model
Figure 3 for Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model
Figure 4 for Generalized Few-shot 3D Point Cloud Segmentation with Vision-Language Model
Viaarxiv icon