Picture for Manling Li

Manling Li

Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas

Add code
Mar 04, 2025
Viaarxiv icon

The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination

Add code
Feb 22, 2025
Figure 1 for The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination
Figure 2 for The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination
Figure 3 for The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination
Figure 4 for The Law of Knowledge Overshadowing: Towards Understanding, Predicting, and Preventing LLM Hallucination
Viaarxiv icon

EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents

Add code
Feb 13, 2025
Figure 1 for EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents
Figure 2 for EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents
Figure 3 for EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents
Figure 4 for EmbodiedBench: Comprehensive Benchmarking Multi-modal Large Language Models for Vision-Driven Embodied Agents
Viaarxiv icon

SyncMind: Measuring Agent Out-of-Sync Recovery in Collaborative Software Engineering

Add code
Feb 10, 2025
Viaarxiv icon

LayoutVLM: Differentiable Optimization of 3D Layout via Vision-Language Models

Add code
Dec 03, 2024
Viaarxiv icon

IKEA Manuals at Work: 4D Grounding of Assembly Instructions on Internet Videos

Add code
Nov 18, 2024
Viaarxiv icon

HourVideo: 1-Hour Video-Language Understanding

Add code
Nov 07, 2024
Figure 1 for HourVideo: 1-Hour Video-Language Understanding
Figure 2 for HourVideo: 1-Hour Video-Language Understanding
Figure 3 for HourVideo: 1-Hour Video-Language Understanding
Figure 4 for HourVideo: 1-Hour Video-Language Understanding
Viaarxiv icon

MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders

Add code
Oct 09, 2024
Figure 1 for MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders
Figure 2 for MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders
Figure 3 for MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders
Figure 4 for MentalArena: Self-play Training of Language Models for Diagnosis and Treatment of Mental Health Disorders
Viaarxiv icon

Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making

Add code
Oct 09, 2024
Figure 1 for Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making
Figure 2 for Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making
Figure 3 for Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making
Figure 4 for Embodied Agent Interface: Benchmarking LLMs for Embodied Decision Making
Viaarxiv icon

Knowledge Overshadowing Causes Amalgamated Hallucination in Large Language Models

Add code
Jul 10, 2024
Viaarxiv icon