Picture for Hanqing Wang

Hanqing Wang

InternVLA-A1: Unifying Understanding, Generation and Action for Robotic Manipulation

Add code
Jan 05, 2026
Viaarxiv icon

VL-LN Bench: Towards Long-horizon Goal-oriented Navigation with Active Dialogs

Add code
Dec 31, 2025
Viaarxiv icon

VLNVerse: A Benchmark for Vision-Language Navigation with Versatile, Embodied, Realistic Simulation and Evaluation

Add code
Dec 22, 2025
Figure 1 for VLNVerse: A Benchmark for Vision-Language Navigation with Versatile, Embodied, Realistic Simulation and Evaluation
Figure 2 for VLNVerse: A Benchmark for Vision-Language Navigation with Versatile, Embodied, Realistic Simulation and Evaluation
Figure 3 for VLNVerse: A Benchmark for Vision-Language Navigation with Versatile, Embodied, Realistic Simulation and Evaluation
Figure 4 for VLNVerse: A Benchmark for Vision-Language Navigation with Versatile, Embodied, Realistic Simulation and Evaluation
Viaarxiv icon

A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning

Add code
Dec 16, 2025
Figure 1 for A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning
Figure 2 for A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning
Figure 3 for A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning
Figure 4 for A4-Agent: An Agentic Framework for Zero-Shot Affordance Reasoning
Viaarxiv icon

Affordance-R1: Reinforcement Learning for Generalizable Affordance Reasoning in Multimodal Large Language Model

Add code
Aug 08, 2025
Viaarxiv icon

SDEval: Safety Dynamic Evaluation for Multimodal Large Language Models

Add code
Aug 08, 2025
Figure 1 for SDEval: Safety Dynamic Evaluation for Multimodal Large Language Models
Figure 2 for SDEval: Safety Dynamic Evaluation for Multimodal Large Language Models
Figure 3 for SDEval: Safety Dynamic Evaluation for Multimodal Large Language Models
Figure 4 for SDEval: Safety Dynamic Evaluation for Multimodal Large Language Models
Viaarxiv icon

InstructVLA: Vision-Language-Action Instruction Tuning from Understanding to Manipulation

Add code
Jul 23, 2025
Viaarxiv icon

Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities

Add code
Jul 17, 2025
Figure 1 for Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities
Figure 2 for Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities
Figure 3 for Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities
Figure 4 for Rethinking the Embodied Gap in Vision-and-Language Navigation: A Holistic Study of Physical and Visual Disparities
Viaarxiv icon

CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation

Add code
Jun 24, 2025
Figure 1 for CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation
Figure 2 for CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation
Figure 3 for CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation
Figure 4 for CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation
Viaarxiv icon

GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation

Add code
Jun 12, 2025
Figure 1 for GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation
Figure 2 for GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation
Figure 3 for GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation
Figure 4 for GENMANIP: LLM-driven Simulation for Generalizable Instruction-Following Manipulation
Viaarxiv icon