Picture for Gongwei Chen

Gongwei Chen

HATS: Hardness-Aware Trajectory Synthesis for GUI Agents

Add code
Mar 12, 2026
Viaarxiv icon

Global Prior Meets Local Consistency: Dual-Memory Augmented Vision-Language-Action Model for Efficient Robotic Manipulation

Add code
Feb 22, 2026
Viaarxiv icon

PersonalAlign: Hierarchical Implicit Intent Alignment for Personalized GUI Agent with Long-Term User-Centric Records

Add code
Jan 14, 2026
Viaarxiv icon

Mirage-1: Augmenting and Updating GUI Agent with Hierarchical Multimodal Skills

Add code
Jun 12, 2025
Figure 1 for Mirage-1: Augmenting and Updating GUI Agent with Hierarchical Multimodal Skills
Figure 2 for Mirage-1: Augmenting and Updating GUI Agent with Hierarchical Multimodal Skills
Figure 3 for Mirage-1: Augmenting and Updating GUI Agent with Hierarchical Multimodal Skills
Figure 4 for Mirage-1: Augmenting and Updating GUI Agent with Hierarchical Multimodal Skills
Viaarxiv icon

Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts

Add code
Jun 12, 2025
Figure 1 for Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts
Figure 2 for Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts
Figure 3 for Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts
Figure 4 for Optimus-3: Towards Generalist Multimodal Minecraft Agents with Scalable Task Experts
Viaarxiv icon

D2AF: A Dual-Driven Annotation and Filtering Framework for Visual Grounding

Add code
May 30, 2025
Figure 1 for D2AF: A Dual-Driven Annotation and Filtering Framework for Visual Grounding
Figure 2 for D2AF: A Dual-Driven Annotation and Filtering Framework for Visual Grounding
Figure 3 for D2AF: A Dual-Driven Annotation and Filtering Framework for Visual Grounding
Figure 4 for D2AF: A Dual-Driven Annotation and Filtering Framework for Visual Grounding
Viaarxiv icon

GUI-explorer: Autonomous Exploration and Mining of Transition-aware Knowledge for GUI Agent

Add code
May 22, 2025
Viaarxiv icon

Curriculum Coarse-to-Fine Selection for High-IPC Dataset Distillation

Add code
Mar 24, 2025
Viaarxiv icon

Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy

Add code
Feb 27, 2025
Figure 1 for Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy
Figure 2 for Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy
Figure 3 for Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy
Figure 4 for Optimus-2: Multimodal Minecraft Agent with Goal-Observation-Action Conditioned Policy
Viaarxiv icon

FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers

Add code
Jan 27, 2025
Figure 1 for FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers
Figure 2 for FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers
Figure 3 for FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers
Figure 4 for FALCON: Resolving Visual Redundancy and Fragmentation in High-resolution Multimodal Large Language Models via Visual Registers
Viaarxiv icon