Picture for Kun Zhou

Kun Zhou

Slot-ID: Identity-Preserving Video Generation from Reference Videos via Slot-Based Temporal Identity Encoding

Add code
Jan 04, 2026
Viaarxiv icon

VIPER: Process-aware Evaluation for Generative Video Reasoning

Add code
Dec 31, 2025
Viaarxiv icon

DeliveryBench: Can Agents Earn Profit in Real World?

Add code
Dec 22, 2025
Figure 1 for DeliveryBench: Can Agents Earn Profit in Real World?
Figure 2 for DeliveryBench: Can Agents Earn Profit in Real World?
Figure 3 for DeliveryBench: Can Agents Earn Profit in Real World?
Figure 4 for DeliveryBench: Can Agents Earn Profit in Real World?
Viaarxiv icon

3DTeethSAM: Taming SAM2 for 3D Teeth Segmentation

Add code
Dec 12, 2025
Figure 1 for 3DTeethSAM: Taming SAM2 for 3D Teeth Segmentation
Figure 2 for 3DTeethSAM: Taming SAM2 for 3D Teeth Segmentation
Figure 3 for 3DTeethSAM: Taming SAM2 for 3D Teeth Segmentation
Figure 4 for 3DTeethSAM: Taming SAM2 for 3D Teeth Segmentation
Viaarxiv icon

TR-Gaussians: High-fidelity Real-time Rendering of Planar Transmission and Reflection with 3D Gaussian Splatting

Add code
Nov 17, 2025
Viaarxiv icon

PAN: A World Model for General, Interactable, and Long-Horizon World Simulation

Add code
Nov 15, 2025
Figure 1 for PAN: A World Model for General, Interactable, and Long-Horizon World Simulation
Figure 2 for PAN: A World Model for General, Interactable, and Long-Horizon World Simulation
Figure 3 for PAN: A World Model for General, Interactable, and Long-Horizon World Simulation
Figure 4 for PAN: A World Model for General, Interactable, and Long-Horizon World Simulation
Viaarxiv icon

Vision-G1: Towards General Vision Language Reasoning with Multi-Domain Data Curation

Add code
Aug 18, 2025
Figure 1 for Vision-G1: Towards General Vision Language Reasoning with Multi-Domain Data Curation
Figure 2 for Vision-G1: Towards General Vision Language Reasoning with Multi-Domain Data Curation
Figure 3 for Vision-G1: Towards General Vision Language Reasoning with Multi-Domain Data Curation
Figure 4 for Vision-G1: Towards General Vision Language Reasoning with Multi-Domain Data Curation
Viaarxiv icon

Can Large Pretrained Depth Estimation Models Help With Image Dehazing?

Add code
Aug 01, 2025
Viaarxiv icon

Motion-example-controlled Co-speech Gesture Generation Leveraging Large Language Models

Add code
Jul 27, 2025
Viaarxiv icon

Revisiting Reinforcement Learning for LLM Reasoning from A Cross-Domain Perspective

Add code
Jun 17, 2025
Viaarxiv icon