Picture for Kecheng Zheng

Kecheng Zheng

Advancing Open-source World Models

Add code
Jan 28, 2026
Viaarxiv icon

A Pragmatic VLA Foundation Model

Add code
Jan 26, 2026
Viaarxiv icon

Learning Consistent Taxonomic Classification through Hierarchical Reasoning

Add code
Jan 21, 2026
Viaarxiv icon

The Great March 100: 100 Detail-oriented Tasks for Evaluating Embodied AI Agents

Add code
Jan 16, 2026
Viaarxiv icon

Vision-Centric Activation and Coordination for Multimodal Large Language Models

Add code
Oct 16, 2025
Viaarxiv icon

VideoMAR: Autoregressive Video Generatio with Continuous Tokens

Add code
Jun 18, 2025
Viaarxiv icon

Aligned Better, Listen Better for Audio-Visual Large Language Models

Add code
Apr 02, 2025
Figure 1 for Aligned Better, Listen Better for Audio-Visual Large Language Models
Figure 2 for Aligned Better, Listen Better for Audio-Visual Large Language Models
Figure 3 for Aligned Better, Listen Better for Audio-Visual Large Language Models
Figure 4 for Aligned Better, Listen Better for Audio-Visual Large Language Models
Viaarxiv icon

Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning

Add code
Dec 12, 2024
Figure 1 for Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning
Figure 2 for Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning
Figure 3 for Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning
Figure 4 for Benchmarking Large Vision-Language Models via Directed Scene Graph for Comprehensive Image Captioning
Viaarxiv icon

Learning Visual Generative Priors without Text

Add code
Dec 10, 2024
Figure 1 for Learning Visual Generative Priors without Text
Figure 2 for Learning Visual Generative Priors without Text
Figure 3 for Learning Visual Generative Priors without Text
Figure 4 for Learning Visual Generative Priors without Text
Viaarxiv icon

MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation

Add code
Dec 08, 2024
Figure 1 for MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation
Figure 2 for MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation
Figure 3 for MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation
Figure 4 for MotionStone: Decoupled Motion Intensity Modulation with Diffusion Transformer for Image-to-Video Generation
Viaarxiv icon