Picture for Junke Wang

Junke Wang

ThinkingVLA: Interleaved Vision and Language Reasoning for Robotic Manipulation

Add code
Jun 16, 2026
Viaarxiv icon

RepWAM: World Action Modeling with Representation Visual-Action Tokenizers

Add code
Jun 11, 2026
Viaarxiv icon

ARM: An AutoRegressive Large Multimodal Model with Unified Discrete Representations

Add code
Jun 09, 2026
Viaarxiv icon

IDEAL: In-DEpth ALignment Makes A Discrete Representation AutoEncoder

Add code
Jun 09, 2026
Viaarxiv icon

OmniGen-AR: AutoRegressive Any-to-Image Generation

Add code
Jun 08, 2026
Viaarxiv icon

DisCo: World Models with Discrete Camera Motion Control

Add code
Jun 06, 2026
Viaarxiv icon

FluxMem: Adaptive Hierarchical Memory for Streaming Video Understanding

Add code
Mar 02, 2026
Viaarxiv icon

VideoLoom: A Video Large Language Model for Joint Spatial-Temporal Understanding

Add code
Jan 12, 2026
Viaarxiv icon

TempoMaster: Efficient Long Video Generation via Next-Frame-Rate Prediction

Add code
Nov 16, 2025
Figure 1 for TempoMaster: Efficient Long Video Generation via Next-Frame-Rate Prediction
Figure 2 for TempoMaster: Efficient Long Video Generation via Next-Frame-Rate Prediction
Figure 3 for TempoMaster: Efficient Long Video Generation via Next-Frame-Rate Prediction
Figure 4 for TempoMaster: Efficient Long Video Generation via Next-Frame-Rate Prediction
Viaarxiv icon

Rethinking Discrete Tokens: Treating Them as Conditions for Continuous Autoregressive Image Synthesis

Add code
Jul 02, 2025
Viaarxiv icon