Picture for Xin Eric Wang

Xin Eric Wang

The Hidden Risks of Large Reasoning Models: A Safety Assessment of R1

Add code
Feb 18, 2025
Viaarxiv icon

GUI-Bee: Align GUI Action Grounding to Novel Environments via Autonomous Exploration

Add code
Jan 27, 2025
Viaarxiv icon

Mojito: Motion Trajectory and Intensity Control for Video Generation

Add code
Dec 12, 2024
Figure 1 for Mojito: Motion Trajectory and Intensity Control for Video Generation
Figure 2 for Mojito: Motion Trajectory and Intensity Control for Video Generation
Figure 3 for Mojito: Motion Trajectory and Intensity Control for Video Generation
Figure 4 for Mojito: Motion Trajectory and Intensity Control for Video Generation
Viaarxiv icon

Agent S: An Open Agentic Framework that Uses Computers Like a Human

Add code
Oct 10, 2024
Figure 1 for Agent S: An Open Agentic Framework that Uses Computers Like a Human
Figure 2 for Agent S: An Open Agentic Framework that Uses Computers Like a Human
Figure 3 for Agent S: An Open Agentic Framework that Uses Computers Like a Human
Figure 4 for Agent S: An Open Agentic Framework that Uses Computers Like a Human
Viaarxiv icon

Multimodal Situational Safety

Add code
Oct 08, 2024
Figure 1 for Multimodal Situational Safety
Figure 2 for Multimodal Situational Safety
Figure 3 for Multimodal Situational Safety
Figure 4 for Multimodal Situational Safety
Viaarxiv icon

EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing

Add code
Oct 03, 2024
Figure 1 for EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing
Figure 2 for EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing
Figure 3 for EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing
Figure 4 for EditRoom: LLM-parameterized Graph Diffusion for Composable 3D Room Layout Editing
Viaarxiv icon

NavGPT-2: Unleashing Navigational Reasoning Capability for Large Vision-Language Models

Add code
Jul 17, 2024
Viaarxiv icon

Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding

Add code
Jun 27, 2024
Figure 1 for Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding
Figure 2 for Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding
Figure 3 for Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding
Figure 4 for Read Anywhere Pointed: Layout-aware GUI Screen Reading with Tree-of-Lens Grounding
Viaarxiv icon

VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing

Add code
Jun 18, 2024
Figure 1 for VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing
Figure 2 for VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing
Figure 3 for VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing
Figure 4 for VIA: A Spatiotemporal Video Adaptation Framework for Global and Local Video Editing
Viaarxiv icon

Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation

Add code
Jun 13, 2024
Figure 1 for Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation
Figure 2 for Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation
Figure 3 for Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation
Figure 4 for Toffee: Efficient Million-Scale Dataset Construction for Subject-Driven Text-to-Image Generation
Viaarxiv icon