Picture for Yan Zhou

Yan Zhou

Department of Radiology, Renji Hospital, School of Medicine, Shanghai Jiao Tong University, Shanghai, China

BMDS-Net: A Bayesian Multi-Modal Deep Supervision Network for Robust Brain Tumor Segmentation

Add code
Jan 24, 2026
Viaarxiv icon

Kling-Omni Technical Report

Add code
Dec 18, 2025
Figure 1 for Kling-Omni Technical Report
Figure 2 for Kling-Omni Technical Report
Figure 3 for Kling-Omni Technical Report
Figure 4 for Kling-Omni Technical Report
Viaarxiv icon

KlingAvatar 2.0 Technical Report

Add code
Dec 15, 2025
Viaarxiv icon

UnityVideo: Unified Multi-Modal Multi-Task Learning for Enhancing World-Aware Video Generation

Add code
Dec 08, 2025
Viaarxiv icon

Few to Big: Prototype Expansion Network via Diffusion Learner for Point Cloud Few-shot Semantic Segmentation

Add code
Sep 16, 2025
Viaarxiv icon

MIDAS: Multimodal Interactive Digital-humAn Synthesis via Real-time Autoregressive Video Generation

Add code
Aug 28, 2025
Viaarxiv icon

A Physics-Driven Neural Network with Parameter Embedding for Generating Quantitative MR Maps from Weighted Images

Add code
Aug 11, 2025
Viaarxiv icon

DiffCap: Diffusion-based Real-time Human Motion Capture using Sparse IMUs and a Monocular Camera

Add code
Aug 08, 2025
Figure 1 for DiffCap: Diffusion-based Real-time Human Motion Capture using Sparse IMUs and a Monocular Camera
Figure 2 for DiffCap: Diffusion-based Real-time Human Motion Capture using Sparse IMUs and a Monocular Camera
Figure 3 for DiffCap: Diffusion-based Real-time Human Motion Capture using Sparse IMUs and a Monocular Camera
Figure 4 for DiffCap: Diffusion-based Real-time Human Motion Capture using Sparse IMUs and a Monocular Camera
Viaarxiv icon

Stream-Omni: Simultaneous Multimodal Interactions with Large Language-Vision-Speech Model

Add code
Jun 16, 2025
Viaarxiv icon

AgentAlign: Navigating Safety Alignment in the Shift from Informative to Agentic Large Language Models

Add code
May 29, 2025
Viaarxiv icon