Picture for Kai Wang

Kai Wang

Refer to the report for detailed contributions

BridgeV2W: Bridging Video Generation Models to Embodied World Models via Embodiment Masks

Add code
Feb 03, 2026
Viaarxiv icon

Detecting AI-Generated Content in Academic Peer Reviews

Add code
Jan 30, 2026
Viaarxiv icon

MeanCache: From Instantaneous to Average Velocity for Accelerating Flow Matching Inference

Add code
Jan 27, 2026
Viaarxiv icon

Emotion-LLaMAv2 and MMEVerse: A New Framework and Benchmark for Multimodal Emotion Understanding

Add code
Jan 23, 2026
Viaarxiv icon

The Llama 4 Herd: Architecture, Training, Evaluation, and Deployment Notes

Add code
Jan 15, 2026
Viaarxiv icon

Surgical Scene Segmentation using a Spike-Driven Video Transformer with Real-Time Potential

Add code
Dec 24, 2025
Viaarxiv icon

LumiCtrl : Learning Illuminant Prompts for Lighting Control in Personalized Text-to-Image Models

Add code
Dec 19, 2025
Viaarxiv icon

StageVAR: Stage-Aware Acceleration for Visual Autoregressive Models

Add code
Dec 18, 2025
Figure 1 for StageVAR: Stage-Aware Acceleration for Visual Autoregressive Models
Figure 2 for StageVAR: Stage-Aware Acceleration for Visual Autoregressive Models
Figure 3 for StageVAR: Stage-Aware Acceleration for Visual Autoregressive Models
Figure 4 for StageVAR: Stage-Aware Acceleration for Visual Autoregressive Models
Viaarxiv icon

HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices

Add code
Dec 16, 2025
Figure 1 for HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices
Figure 2 for HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices
Figure 3 for HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices
Figure 4 for HyperVL: An Efficient and Dynamic Multimodal Large Language Model for Edge Devices
Viaarxiv icon

Distill Video Datasets into Images

Add code
Dec 16, 2025
Figure 1 for Distill Video Datasets into Images
Figure 2 for Distill Video Datasets into Images
Figure 3 for Distill Video Datasets into Images
Figure 4 for Distill Video Datasets into Images
Viaarxiv icon