Picture for Hongxu Yin

Hongxu Yin

Celine

3D Aware Region Prompted Vision Language Model

Add code
Sep 16, 2025
Viaarxiv icon

Test-Time Scaling Strategies for Generative Retrieval in Multimodal Conversational Recommendations

Add code
Aug 25, 2025
Viaarxiv icon

EgoVLA: Learning Vision-Language-Action Models from Egocentric Human Videos

Add code
Jul 16, 2025
Viaarxiv icon

Scaling RL to Long Videos

Add code
Jul 10, 2025
Viaarxiv icon

CLIMB: CLustering-based Iterative Data Mixture Bootstrapping for Language Model Pre-training

Add code
Apr 17, 2025
Viaarxiv icon

Scaling Vision Pre-Training to 4K Resolution

Add code
Mar 25, 2025
Viaarxiv icon

Token-Efficient Long Video Understanding for Multimodal LLMs

Add code
Mar 06, 2025
Viaarxiv icon

WorldModelBench: Judging Video Generation Models As World Models

Add code
Feb 28, 2025
Viaarxiv icon

Advancing Weight and Channel Sparsification with Enhanced Saliency

Add code
Feb 05, 2025
Figure 1 for Advancing Weight and Channel Sparsification with Enhanced Saliency
Figure 2 for Advancing Weight and Channel Sparsification with Enhanced Saliency
Figure 3 for Advancing Weight and Channel Sparsification with Enhanced Saliency
Figure 4 for Advancing Weight and Channel Sparsification with Enhanced Saliency
Viaarxiv icon

NaVILA: Legged Robot Vision-Language-Action Model for Navigation

Add code
Dec 05, 2024
Figure 1 for NaVILA: Legged Robot Vision-Language-Action Model for Navigation
Figure 2 for NaVILA: Legged Robot Vision-Language-Action Model for Navigation
Figure 3 for NaVILA: Legged Robot Vision-Language-Action Model for Navigation
Figure 4 for NaVILA: Legged Robot Vision-Language-Action Model for Navigation
Viaarxiv icon