Picture for Susan Liang

Susan Liang

Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models

Add code
Oct 06, 2025
Viaarxiv icon

High-Quality Sound Separation Across Diverse Categories via Visually-Guided Generative Modeling

Add code
Sep 26, 2025
Viaarxiv icon

ZeroSep: Separate Anything in Audio with Zero Training

Add code
May 29, 2025
Viaarxiv icon

BinauralFlow: A Causal and Streamable Approach for High-Quality Binaural Speech Synthesis with Flow Matching Models

Add code
May 28, 2025
Viaarxiv icon

MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness

Add code
May 26, 2025
Viaarxiv icon

The Sword of Damocles in ViTs: Computational Redundancy Amplifies Adversarial Transferability

Add code
Apr 15, 2025
Viaarxiv icon

Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting

Add code
Apr 09, 2025
Viaarxiv icon

Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1)

Add code
Apr 04, 2025
Figure 1 for Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1)
Figure 2 for Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1)
Figure 3 for Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1)
Figure 4 for Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1)
Viaarxiv icon

FreSca: Unveiling the Scaling Space in Diffusion Models

Add code
Apr 02, 2025
Viaarxiv icon

VERIFY: A Benchmark of Visual Explanation and Reasoning for Investigating Multimodal Reasoning Fidelity

Add code
Mar 14, 2025
Viaarxiv icon