Picture for Xin Eric Wang

Xin Eric Wang

Self-Resource Allocation in Multi-Agent LLM Systems

Add code
Apr 02, 2025
Viaarxiv icon

Agent S2: A Compositional Generalist-Specialist Framework for Computer Use Agents

Add code
Apr 01, 2025
Viaarxiv icon

Multimodal Inconsistency Reasoning (MMIR): A New Benchmark for Multimodal Reasoning Models

Add code
Feb 22, 2025
Viaarxiv icon

The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1

Add code
Feb 18, 2025
Viaarxiv icon

GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration

Add code
Jan 27, 2025
Viaarxiv icon

Mojito: Motion Trajectory and Intensity Control for Video Generation

Add code
Dec 12, 2024
Figure 1 for Mojito: Motion Trajectory and Intensity Control for Video Generation
Figure 2 for Mojito: Motion Trajectory and Intensity Control for Video Generation
Figure 3 for Mojito: Motion Trajectory and Intensity Control for Video Generation
Figure 4 for Mojito: Motion Trajectory and Intensity Control for Video Generation
Viaarxiv icon

Agent S: An Open Agentic Framework that Uses Computers Like a Human

Add code
Oct 10, 2024
Figure 1 for Agent S: An Open Agentic Framework that Uses Computers Like a Human
Figure 2 for Agent S: An Open Agentic Framework that Uses Computers Like a Human
Figure 3 for Agent S: An Open Agentic Framework that Uses Computers Like a Human
Figure 4 for Agent S: An Open Agentic Framework that Uses Computers Like a Human
Viaarxiv icon

Multimodal Situational Safety

Add code
Oct 08, 2024
Figure 1 for Multimodal Situational Safety
Figure 2 for Multimodal Situational Safety
Figure 3 for Multimodal Situational Safety
Figure 4 for Multimodal Situational Safety
Viaarxiv icon

EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing

Add code
Oct 03, 2024
Figure 1 for EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing
Figure 2 for EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing
Figure 3 for EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing
Figure 4 for EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing
Viaarxiv icon

NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models

Add code
Jul 17, 2024
Viaarxiv icon