Picture for Yukang Chen

Yukang Chen

Harli: SLO-Aware Co-location of LLM Inference and PEFT-based Finetuning on Model-as-a-Service Platforms

Add code
Nov 19, 2025
Viaarxiv icon

StreamingVLM: Real-Time Understanding for Infinite Video Streams

Add code
Oct 10, 2025
Viaarxiv icon

LongLive: Real-time Interactive Long Video Generation

Add code
Sep 26, 2025
Viaarxiv icon

3D Aware Region Prompted Vision Language Model

Add code
Sep 16, 2025
Figure 1 for 3D Aware Region Prompted Vision Language Model
Figure 2 for 3D Aware Region Prompted Vision Language Model
Figure 3 for 3D Aware Region Prompted Vision Language Model
Figure 4 for 3D Aware Region Prompted Vision Language Model
Viaarxiv icon

Scaling RL to Long Videos

Add code
Jul 10, 2025
Viaarxiv icon

MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO

Add code
May 19, 2025
Figure 1 for MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
Figure 2 for MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
Figure 3 for MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
Figure 4 for MindOmni: Unleashing Reasoning Generation in Vision Language Models with RGPO
Viaarxiv icon

TraveLLaMA: Facilitating Multi-modal Large Language Models to Understand Urban Scenes and Provide Travel Assistance

Add code
Apr 23, 2025
Viaarxiv icon

WorldModelBench: Judging Video Generation Models As World Models

Add code
Feb 28, 2025
Figure 1 for WorldModelBench: Judging Video Generation Models As World Models
Figure 2 for WorldModelBench: Judging Video Generation Models As World Models
Figure 3 for WorldModelBench: Judging Video Generation Models As World Models
Figure 4 for WorldModelBench: Judging Video Generation Models As World Models
Viaarxiv icon

Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition

Add code
Dec 12, 2024
Figure 1 for Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Figure 2 for Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Figure 3 for Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Figure 4 for Lyra: An Efficient and Speech-Centric Framework for Omni-Cognition
Viaarxiv icon

NVILA: Efficient Frontier Visual Language Models

Add code
Dec 05, 2024
Figure 1 for NVILA: Efficient Frontier Visual Language Models
Figure 2 for NVILA: Efficient Frontier Visual Language Models
Figure 3 for NVILA: Efficient Frontier Visual Language Models
Figure 4 for NVILA: Efficient Frontier Visual Language Models
Viaarxiv icon