Picture for Yu Su

Yu Su

RoboSpatial: Teaching Spatial Understanding to 2D and 3D Vision-Language Models for Robotics

Add code
Nov 25, 2024
Viaarxiv icon

Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents

Add code
Nov 10, 2024
Figure 1 for Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
Figure 2 for Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
Figure 3 for Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
Figure 4 for Is Your LLM Secretly a World Model of the Internet? Model-Based Planning for Web Agents
Viaarxiv icon

Integrated Location Sensing and Communication for Ultra-Massive MIMO With Hybrid-Field Beam-Squint Effect

Add code
Nov 08, 2024
Figure 1 for Integrated Location Sensing and Communication for Ultra-Massive MIMO With Hybrid-Field Beam-Squint Effect
Figure 2 for Integrated Location Sensing and Communication for Ultra-Massive MIMO With Hybrid-Field Beam-Squint Effect
Figure 3 for Integrated Location Sensing and Communication for Ultra-Massive MIMO With Hybrid-Field Beam-Squint Effect
Figure 4 for Integrated Location Sensing and Communication for Ultra-Massive MIMO With Hybrid-Field Beam-Squint Effect
Viaarxiv icon

ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery

Add code
Oct 07, 2024
Figure 1 for ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery
Figure 2 for ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery
Figure 3 for ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery
Figure 4 for ScienceAgentBench: Toward Rigorous Assessment of Language Agents for Data-Driven Scientific Discovery
Viaarxiv icon

Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents

Add code
Oct 07, 2024
Figure 1 for Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Figure 2 for Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Figure 3 for Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Figure 4 for Navigating the Digital World as Humans Do: Universal Visual Grounding for GUI Agents
Viaarxiv icon

Attention in Large Language Models Yields Efficient Zero-Shot Re-Rankers

Add code
Oct 03, 2024
Viaarxiv icon

Fine-Tuning is Fine, if Calibrated

Add code
Sep 24, 2024
Viaarxiv icon

MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark

Add code
Sep 04, 2024
Figure 1 for MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark
Figure 2 for MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark
Figure 3 for MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark
Figure 4 for MMMU-Pro: A More Robust Multi-discipline Multimodal Understanding Benchmark
Viaarxiv icon

VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images

Add code
Aug 28, 2024
Figure 1 for VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images
Figure 2 for VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images
Figure 3 for VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images
Figure 4 for VLM4Bio: A Benchmark Dataset to Evaluate Pretrained Vision-Language Models for Trait Discovery from Biological Images
Viaarxiv icon

RePair: Automated Program Repair with Process-based Feedback

Add code
Aug 21, 2024
Figure 1 for RePair: Automated Program Repair with Process-based Feedback
Figure 2 for RePair: Automated Program Repair with Process-based Feedback
Figure 3 for RePair: Automated Program Repair with Process-based Feedback
Figure 4 for RePair: Automated Program Repair with Process-based Feedback
Viaarxiv icon