Picture for Yunlong Tang

Yunlong Tang

Video-LMM Post-Training: A Deep Dive into Video Reasoning with Large Multimodal Models

Add code
Oct 06, 2025
Viaarxiv icon

Can Sound Replace Vision in LLaVA With Token Substitution?

Add code
Jun 12, 2025
Viaarxiv icon

ZeroSep: Separate Anything in Audio with Zero Training

Add code
May 29, 2025
Viaarxiv icon

MMPerspective: Do MLLMs Understand Perspective? A Comprehensive Benchmark for Perspective Perception, Reasoning, and Robustness

Add code
May 26, 2025
Viaarxiv icon

MMIG-Bench: Towards Comprehensive and Explainable Evaluation of Multi-Modal Image Generation Models

Add code
May 26, 2025
Viaarxiv icon

The Sword of Damocles in ViTs: Computational Redundancy Amplifies Adversarial Transferability

Add code
Apr 15, 2025
Viaarxiv icon

The Tenth NTIRE 2025 Efficient Super-Resolution Challenge Report

Add code
Apr 14, 2025
Viaarxiv icon

Caption Anything in Video: Fine-grained Object-centric Captioning via Spatiotemporal Multimodal Prompting

Add code
Apr 09, 2025
Viaarxiv icon

Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1)

Add code
Apr 04, 2025
Figure 1 for Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1)
Figure 2 for Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1)
Figure 3 for Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1)
Figure 4 for Why Reasoning Matters? A Survey of Advancements in Multimodal Reasoning (v1)
Viaarxiv icon

FreSca: Unveiling the Scaling Space in Diffusion Models

Add code
Apr 02, 2025
Viaarxiv icon