Picture for Dahua Lin

Dahua Lin

Eric

ScaleCap: Inference-Time Scalable Image Captioning via Dual-Modality Debiasing

Add code
Jun 24, 2025
Viaarxiv icon

CronusVLA: Transferring Latent Motion Across Time for Multi-Frame Prediction in Manipulation

Add code
Jun 24, 2025
Viaarxiv icon

InterActHuman: Multi-Concept Human Animation with Layout-Aligned Audio Conditions

Add code
Jun 11, 2025
Viaarxiv icon

GTR-CoT: Graph Traversal as Visual Chain of Thought for Molecular Structure Recognition

Add code
Jun 09, 2025
Viaarxiv icon

Video World Models with Long-term Spatial Memory

Add code
Jun 05, 2025
Viaarxiv icon

AnySplat: Feed-forward 3D Gaussian Splatting from Unconstrained Views

Add code
May 29, 2025
Viaarxiv icon

MMSI-Bench: A Benchmark for Multi-Image Spatial Intelligence

Add code
May 29, 2025
Viaarxiv icon

Multi-SpatialMLLM: Multi-Frame Spatial Understanding with Multi-Modal Large Language Models

Add code
May 22, 2025
Viaarxiv icon

Evaluating Large Language Model with Knowledge Oriented Language Specific Simple Question Answering

Add code
May 22, 2025
Viaarxiv icon

Visual Agentic Reinforcement Fine-Tuning

Add code
May 20, 2025
Viaarxiv icon