Picture for Wenhui Tan

Wenhui Tan

Video-OPD: Efficient Post-Training of Multimodal Large Language Models for Temporal Video Grounding via On-Policy Distillation

Add code
Feb 03, 2026
Viaarxiv icon

Restoring Exploration after Post-Training: Latent Exploration Decoding for Large Reasoning Models

Add code
Feb 02, 2026
Viaarxiv icon

Think-Clip-Sample: Slow-Fast Frame Selection for Video Understanding

Add code
Jan 16, 2026
Viaarxiv icon

Xiaomi MiMo-VL-Miloco Technical Report

Add code
Dec 22, 2025
Figure 1 for Xiaomi MiMo-VL-Miloco Technical Report
Figure 2 for Xiaomi MiMo-VL-Miloco Technical Report
Figure 3 for Xiaomi MiMo-VL-Miloco Technical Report
Figure 4 for Xiaomi MiMo-VL-Miloco Technical Report
Viaarxiv icon

JointAVBench: A Benchmark for Joint Audio-Visual Reasoning Evaluation

Add code
Dec 14, 2025
Viaarxiv icon

REVISOR: Beyond Textual Reflection, Towards Multimodal Introspective Reasoning in Long-Form Video Understanding

Add code
Nov 17, 2025
Viaarxiv icon

Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains

Add code
May 22, 2025
Figure 1 for Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains
Figure 2 for Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains
Figure 3 for Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains
Figure 4 for Think Silently, Think Fast: Dynamic Latent Compression of LLM Reasoning Chains
Viaarxiv icon

Multi-task Manipulation Policy Modeling with Visuomotor Latent Diffusion

Add code
Mar 12, 2024
Figure 1 for Multi-task Manipulation Policy Modeling with Visuomotor Latent Diffusion
Figure 2 for Multi-task Manipulation Policy Modeling with Visuomotor Latent Diffusion
Figure 3 for Multi-task Manipulation Policy Modeling with Visuomotor Latent Diffusion
Figure 4 for Multi-task Manipulation Policy Modeling with Visuomotor Latent Diffusion
Viaarxiv icon

Pave the Way to Grasp Anything: Transferring Foundation Models for Universal Pick-Place Robots

Add code
Jun 25, 2023
Viaarxiv icon

AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation

Add code
May 30, 2023
Figure 1 for AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation
Figure 2 for AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation
Figure 3 for AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation
Figure 4 for AlphaBlock: Embodied Finetuning for Vision-Language Reasoning in Robot Manipulation
Viaarxiv icon