Picture for Zefan Cai

Zefan Cai

BabyVision: Visual Reasoning Beyond Language

Add code
Jan 10, 2026
Viaarxiv icon

MMGR: Multi-Modal Generative Reasoning

Add code
Dec 17, 2025
Figure 1 for MMGR: Multi-Modal Generative Reasoning
Figure 2 for MMGR: Multi-Modal Generative Reasoning
Figure 3 for MMGR: Multi-Modal Generative Reasoning
Figure 4 for MMGR: Multi-Modal Generative Reasoning
Viaarxiv icon

Scaling Up Audio-Synchronized Visual Animation: An Efficient Training Paradigm

Add code
Aug 05, 2025
Viaarxiv icon

A Survey on Latent Reasoning

Add code
Jul 08, 2025
Figure 1 for A Survey on Latent Reasoning
Figure 2 for A Survey on Latent Reasoning
Figure 3 for A Survey on Latent Reasoning
Figure 4 for A Survey on Latent Reasoning
Viaarxiv icon

R-KV: Redundancy-aware KV Cache Compression for Training-Free Reasoning Models Acceleration

Add code
May 30, 2025
Viaarxiv icon

VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection

Add code
May 26, 2025
Figure 1 for VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection
Figure 2 for VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection
Figure 3 for VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection
Figure 4 for VisualToolAgent (VisTA): A Reinforcement Learning Framework for Visual Tool Selection
Viaarxiv icon

AdaptiveStep: Automatically Dividing Reasoning Step through Model Confidence

Add code
Feb 19, 2025
Viaarxiv icon

HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading

Add code
Feb 18, 2025
Figure 1 for HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading
Figure 2 for HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading
Figure 3 for HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading
Figure 4 for HeadInfer: Memory-Efficient LLM Inference by Head-wise Offloading
Viaarxiv icon

Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey

Add code
Dec 30, 2024
Figure 1 for Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
Figure 2 for Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
Figure 3 for Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
Figure 4 for Next Token Prediction Towards Multimodal Intelligence: A Comprehensive Survey
Viaarxiv icon

No Preference Left Behind: Group Distributional Preference Optimization

Add code
Dec 28, 2024
Figure 1 for No Preference Left Behind: Group Distributional Preference Optimization
Figure 2 for No Preference Left Behind: Group Distributional Preference Optimization
Figure 3 for No Preference Left Behind: Group Distributional Preference Optimization
Figure 4 for No Preference Left Behind: Group Distributional Preference Optimization
Viaarxiv icon