Picture for Yu Su

Yu Su

Explorer: Scaling Exploration-driven Web Trajectory Synthesis for Multimodal Web Agents

Add code
Feb 19, 2025
Viaarxiv icon

Sparse Autoencoders for Scientifically Rigorous Interpretation of Vision Models

Add code
Feb 10, 2025
Viaarxiv icon

Finer-CAM: Spotting the Difference Reveals Finer Details for Visual Explanation

Add code
Jan 20, 2025
Figure 1 for Finer-CAM: Spotting the Difference Reveals Finer Details for Visual Explanation
Figure 2 for Finer-CAM: Spotting the Difference Reveals Finer Details for Visual Explanation
Figure 3 for Finer-CAM: Spotting the Difference Reveals Finer Details for Visual Explanation
Figure 4 for Finer-CAM: Spotting the Difference Reveals Finer Details for Visual Explanation
Viaarxiv icon

Prompt-CAM: A Simpler Interpretable Transformer for Fine-Grained Analysis

Add code
Jan 16, 2025
Figure 1 for Prompt-CAM: A Simpler Interpretable Transformer for Fine-Grained Analysis
Figure 2 for Prompt-CAM: A Simpler Interpretable Transformer for Fine-Grained Analysis
Figure 3 for Prompt-CAM: A Simpler Interpretable Transformer for Fine-Grained Analysis
Figure 4 for Prompt-CAM: A Simpler Interpretable Transformer for Fine-Grained Analysis
Viaarxiv icon

Static Segmentation by Tracking: A Frustratingly Label-Efficient Approach to Fine-Grained Segmentation

Add code
Jan 12, 2025
Viaarxiv icon

RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics

Add code
Nov 25, 2024
Viaarxiv icon

Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents

Add code
Nov 10, 2024
Figure 1 for Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
Figure 2 for Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
Figure 3 for Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
Figure 4 for Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
Viaarxiv icon

Integrated Location Sensing and Communication for Ultra-Massive MIMO With Hybrid-Field Beam-Squint Effect

Add code
Nov 08, 2024
Figure 1 for Integrated Location Sensing and Communication for Ultra-Massive MIMO With Hybrid-Field Beam-Squint Effect
Figure 2 for Integrated Location Sensing and Communication for Ultra-Massive MIMO With Hybrid-Field Beam-Squint Effect
Figure 3 for Integrated Location Sensing and Communication for Ultra-Massive MIMO With Hybrid-Field Beam-Squint Effect
Figure 4 for Integrated Location Sensing and Communication for Ultra-Massive MIMO With Hybrid-Field Beam-Squint Effect
Viaarxiv icon

Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents

Add code
Oct 07, 2024
Figure 1 for Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Figure 2 for Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Figure 3 for Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Figure 4 for Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Viaarxiv icon

ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery

Add code
Oct 07, 2024
Figure 1 for ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery
Figure 2 for ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery
Figure 3 for ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery
Figure 4 for ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery
Viaarxiv icon