Picture for Zhenpeng Huang

Zhenpeng Huang

LongVPO: From Anchored Cues to Self-Reasoning for Long-Form Video Preference Optimization

Add code
Feb 02, 2026
Viaarxiv icon

N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models

Add code
Dec 18, 2025
Viaarxiv icon

Online Video Understanding: A Comprehensive Benchmark and Memory-Augmented Method

Add code
Dec 31, 2024
Figure 1 for Online Video Understanding: A Comprehensive Benchmark and Memory-Augmented Method
Figure 2 for Online Video Understanding: A Comprehensive Benchmark and Memory-Augmented Method
Figure 3 for Online Video Understanding: A Comprehensive Benchmark and Memory-Augmented Method
Figure 4 for Online Video Understanding: A Comprehensive Benchmark and Memory-Augmented Method
Viaarxiv icon

p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay

Add code
Dec 05, 2024
Figure 1 for p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay
Figure 2 for p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay
Figure 3 for p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay
Figure 4 for p-MoD: Building Mixture-of-Depths MLLMs via Progressive Ratio Decay
Viaarxiv icon

VideoEval: Comprehensive Benchmark Suite for Low-Cost Evaluation of Video Foundation Model

Add code
Jul 09, 2024
Viaarxiv icon

Data-efficient Event Camera Pre-training via Disentangled Masked Modeling

Add code
Mar 01, 2024
Viaarxiv icon