Picture for Kunchang Li

Kunchang Li

VideoChat-Flash: Hierarchical Compression for Long-Context Video Modeling

Add code
Dec 31, 2024
Viaarxiv icon

Task Preference Optimization: Improving Multimodal Large Language Models with Vision Task Alignment

Add code
Dec 26, 2024
Viaarxiv icon

Causal Diffusion Transformers for Generative Modeling

Add code
Dec 17, 2024
Viaarxiv icon

Bootstrapping Language-Guided Navigation Learning with Self-Refining Data Flywheel

Add code
Dec 11, 2024
Viaarxiv icon

TimeSuite: Improving MLLMs for Long Video Understanding via Grounded Tuning

Add code
Oct 25, 2024
Viaarxiv icon

TransAgent: Transfer Vision-Language Foundation Models with Heterogeneous Agent Collaboration

Add code
Oct 16, 2024
Viaarxiv icon

MUSES: 3D-Controllable Image Generation via Multi-Modal Agent Collaboration

Add code
Aug 21, 2024
Viaarxiv icon

VideoEval: Comprehensive Benchmark Suite for Low-Cost Evaluation of Video Foundation Model

Add code
Jul 09, 2024
Viaarxiv icon

InternVideo2: Scaling Video Foundation Models for Multimodal Video Understanding

Add code
Mar 22, 2024
Viaarxiv icon

Video Mamba Suite: State Space Model as a Versatile Alternative for Video Understanding

Add code
Mar 14, 2024
Viaarxiv icon