Picture for Yukang Chen

Yukang Chen

StreamingVLM: Real-Time Understanding for Infinite Video Streams

Add code
Oct 10, 2025
Viaarxiv icon

LongLive: Real-time Interactive Long Video Generation

Add code
Sep 26, 2025
Viaarxiv icon

3D Aware Region Prompted Vision Language Model

Add code
Sep 16, 2025
Viaarxiv icon

Scaling RL to Long Videos

Add code
Jul 10, 2025
Viaarxiv icon

MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO

Add code
May 19, 2025
Viaarxiv icon

TraveLLaMA: Facilitating Multi-modal Large Language Models to Understand Urban Scenes and Provide Travel Assistance

Add code
Apr 23, 2025
Viaarxiv icon

WorldModelBench: Judging Video Generation Models As World Models

Add code
Feb 28, 2025
Viaarxiv icon

Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition

Add code
Dec 12, 2024
Figure 1 for Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Figure 2 for Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Figure 3 for Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Figure 4 for Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Viaarxiv icon

VisionZip: Longer is Better but Not Necessary in Vision Language Models

Add code
Dec 05, 2024
Figure 1 for VisionZip: Longer is Better but Not Necessary in Vision Language Models
Figure 2 for VisionZip: Longer is Better but Not Necessary in Vision Language Models
Figure 3 for VisionZip: Longer is Better but Not Necessary in Vision Language Models
Figure 4 for VisionZip: Longer is Better but Not Necessary in Vision Language Models
Viaarxiv icon

NVILA: Efficient Frontier Visual Language Models

Add code
Dec 05, 2024
Figure 1 for NVILA: Efficient Frontier Visual Language Models
Figure 2 for NVILA: Efficient Frontier Visual Language Models
Figure 3 for NVILA: Efficient Frontier Visual Language Models
Figure 4 for NVILA: Efficient Frontier Visual Language Models
Viaarxiv icon