Picture for Dong Yu

Dong Yu

PhyAVBench: A Challenging Audio Physics-Sensitivity Benchmark for Physically Grounded Text-to-Audio-Video Generation

Add code
Dec 30, 2025
Viaarxiv icon

Stable and Efficient Single-Rollout RL for Multimodal Reasoning

Add code
Dec 20, 2025
Viaarxiv icon

RePlan: Reasoning-guided Region Planning for Complex Instruction-based Image Editing

Add code
Dec 18, 2025
Viaarxiv icon

N3D-VLM: Native 3D Grounding Enables Accurate Spatial Reasoning in Vision-Language Models

Add code
Dec 18, 2025
Viaarxiv icon

Can LLMs Guide Their Own Exploration? Gradient-Guided Reinforcement Learning for LLM Reasoning

Add code
Dec 17, 2025
Viaarxiv icon

MotionEdit: Benchmarking and Learning Motion-Centric Image Editing

Add code
Dec 14, 2025
Viaarxiv icon

Auden-Voice: General-Purpose Voice Encoder for Speech and Language Understanding

Add code
Nov 19, 2025
Viaarxiv icon

TTA: Transcribe, Translate and Alignment for Cross-lingual Speech Representation

Add code
Nov 18, 2025
Viaarxiv icon

DeepCompress: A Dual Reward Strategy for Dynamically Exploring and Compressing Reasoning Chains

Add code
Oct 31, 2025
Viaarxiv icon

Understanding and Enhancing Mamba-Transformer Hybrids for Memory Recall and Language Modeling

Add code
Oct 30, 2025
Viaarxiv icon