Picture for Liang Wang

Liang Wang

Institute of Automation, CAS

BridgeV2W: Bridging Video Generation Models to Embodied World Models via Embodiment Masks

Add code
Feb 03, 2026
Viaarxiv icon

How Well Do Models Follow Visual Instructions? VIBE: A Systematic Benchmark for Visual Instruction-Driven Image Editing

Add code
Feb 02, 2026
Viaarxiv icon

CURP: Codebook-based Continuous User Representation for Personalized Generation with LLMs

Add code
Jan 31, 2026
Viaarxiv icon

ShotFinder: Imagination-Driven Open-Domain Video Shot Retrieval via Web Search

Add code
Jan 30, 2026
Viaarxiv icon

NAG: A Unified Native Architecture for Encoder-free Text-Graph Modeling in Language Models

Add code
Jan 30, 2026
Viaarxiv icon

ToolWeaver: Weaving Collaborative Semantics for Scalable Tool Use in Large Language Models

Add code
Jan 29, 2026
Viaarxiv icon

DiaDem: Advancing Dialogue Descriptions in Audiovisual Video Captioning for Multimodal Large Language Models

Add code
Jan 27, 2026
Viaarxiv icon

VIBEVOICE-ASR Technical Report

Add code
Jan 26, 2026
Viaarxiv icon

Evaluating and Achieving Controllable Code Completion in Code LLM

Add code
Jan 22, 2026
Viaarxiv icon

PUMA: Perception-driven Unified Foothold Prior for Mobility Augmented Quadruped Parkour

Add code
Jan 22, 2026
Viaarxiv icon