Picture for Yali Wang

Yali Wang

ShenZhen Key Lab of Computer Vision and Pattern Recognition, SIAT-SenseTime Joint Lab, Shenzhen Institutes of Advanced Technology, Chinese Academy of Sciences

V-Stylist: Video Stylization via Collaboration and Reflection of MLLM Agents

Add code
Mar 15, 2025
Viaarxiv icon

LVAgent: Long Video Understanding by Multi-Round Dynamical Collaboration of MLLM Agents

Add code
Mar 13, 2025
Viaarxiv icon

TimeStep Master: Asymmetrical Mixture of Timestep LoRA Experts for Versatile and Efficient Diffusion Models in Vision

Add code
Mar 10, 2025
Viaarxiv icon

Get In Video: Add Anything You Want to the Video

Add code
Mar 08, 2025
Viaarxiv icon

An Egocentric Vision-Language Model based Portable Real-time Smart Assistant

Add code
Mar 06, 2025
Viaarxiv icon

WeGen: A Unified Model for Interactive Multimodal Generation as We Chat

Add code
Mar 03, 2025
Viaarxiv icon

Modeling Fine-Grained Hand-Object Dynamics for Egocentric Video Representation Learning

Add code
Mar 02, 2025
Viaarxiv icon

InternVideo2.5: Empowering Video MLLMs with Long and Rich Context Modeling

Add code
Jan 21, 2025
Viaarxiv icon

H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving

Add code
Jan 08, 2025
Figure 1 for H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving
Figure 2 for H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving
Figure 3 for H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving
Figure 4 for H-MBA: Hierarchical MamBa Adaptation for Multi-Modal Video Understanding in Autonomous Driving
Viaarxiv icon

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling

Add code
Dec 31, 2024
Viaarxiv icon