Picture for Zhenhailong Wang

Zhenhailong Wang

ERA: Transforming VLMs into Embodied Agents via Embodied Prior Learning and Online Reinforcement Learning

Add code
Oct 14, 2025
Viaarxiv icon

Multimodal Policy Internalization for Conversational Agents

Add code
Oct 10, 2025
Viaarxiv icon

Perception-Aware Policy Optimization for Multimodal Reasoning

Add code
Jul 08, 2025
Figure 1 for Perception-Aware Policy Optimization for Multimodal Reasoning
Figure 2 for Perception-Aware Policy Optimization for Multimodal Reasoning
Figure 3 for Perception-Aware Policy Optimization for Multimodal Reasoning
Figure 4 for Perception-Aware Policy Optimization for Multimodal Reasoning
Viaarxiv icon

DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs

Add code
Apr 23, 2025
Figure 1 for DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs
Figure 2 for DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs
Figure 3 for DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs
Figure 4 for DyMU: Dynamic Merging and Virtual Unmerging for Efficient VLMs
Viaarxiv icon

MultiAgentBench: Evaluating the Collaboration and Competition of LLM agents

Add code
Mar 03, 2025
Viaarxiv icon

Synthia: Novel Concept Design with Affordance Composition

Add code
Feb 25, 2025
Viaarxiv icon

Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks

Add code
Jan 20, 2025
Figure 1 for Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks
Figure 2 for Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks
Figure 3 for Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks
Figure 4 for Mobile-Agent-E: Self-Evolving Mobile Assistant for Complex Tasks
Viaarxiv icon

Infogent: An Agent-Based Framework for Web Information Aggregation

Add code
Oct 24, 2024
Figure 1 for Infogent: An Agent-Based Framework for Web Information Aggregation
Figure 2 for Infogent: An Agent-Based Framework for Web Information Aggregation
Figure 3 for Infogent: An Agent-Based Framework for Web Information Aggregation
Figure 4 for Infogent: An Agent-Based Framework for Web Information Aggregation
Viaarxiv icon

Text-Based Reasoning About Vector Graphics

Add code
Apr 10, 2024
Figure 1 for Text-Based Reasoning About Vector Graphics
Figure 2 for Text-Based Reasoning About Vector Graphics
Figure 3 for Text-Based Reasoning About Vector Graphics
Figure 4 for Text-Based Reasoning About Vector Graphics
Viaarxiv icon

Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning

Add code
Dec 15, 2023
Figure 1 for Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning
Figure 2 for Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning
Figure 3 for Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning
Figure 4 for Do LVLMs Understand Charts? Analyzing and Correcting Factual Errors in Chart Captioning
Viaarxiv icon