Picture for Wei Li

Wei Li

Tsinghua University, Beijing, China

video-SALMONN-R$^3$: Learning to ReWatch, ReAsk, and ReAnswer for Efficient Video Understanding

Add code
Jun 23, 2026
Viaarxiv icon

Holo-World: Unified Camera, Object and Weather Control for Video World Model

Add code
Jun 18, 2026
Viaarxiv icon

Beyond Artifacts: Towards Generalizable Synthetic Song Detection via Music-Intrinsic Features

Add code
Jun 15, 2026
Viaarxiv icon

Encode Errors: Representational Retrieval of In-Context Demonstrations for Multilingual Grammatical Error Correction

Add code
Jun 13, 2026
Viaarxiv icon

Cross-Modal Masked Compositional Concept Modeling for Enhancing Visio-Linguistic Compositionality

Add code
Jun 11, 2026
Viaarxiv icon

ISAP-3D: Identity-Slot Aligned Part-Aware 3D Generation

Add code
Jun 10, 2026
Viaarxiv icon

LoMC: Localized Multidirectional Correction for Refusal Suppression in Routed Foundation Models

Add code
Jun 10, 2026
Viaarxiv icon

LAFP: Preserving Latent Action Structure in Latent Policy Learning via Flow Matching

Add code
Jun 09, 2026
Viaarxiv icon

Prisma-World: Camera-Controllable Multi-Agent Video World Model

Add code
Jun 08, 2026
Viaarxiv icon

Physics-Driven Semantic Scattering Structure Understanding of Aircraft Target in SAR Images

Add code
Jun 05, 2026
Viaarxiv icon