Picture for Xinyu Xiao

Xinyu Xiao

School of Mathematical Science, Peking University

UniVid: Pyramid Diffusion Model for High Quality Video Generation

Add code
Mar 14, 2026
Viaarxiv icon

PA-Net: Precipitation-Adaptive Mixture-of-Experts for Long-Tail Rainfall Nowcasting

Add code
Mar 14, 2026
Viaarxiv icon

MeTok: An Efficient Meteorological Tokenization with Hyper-Aligned Group Learning for Precipitation Nowcasting

Add code
Mar 14, 2026
Viaarxiv icon

Fast-Slow Efficient Training for Multimodal Large Language Models via Visual Token Pruning

Add code
Feb 03, 2026
Viaarxiv icon

PruneRAG: Confidence-Guided Query Decomposition Trees for Efficient Retrieval-Augmented Generation

Add code
Jan 16, 2026
Viaarxiv icon

RayFusion: Ray Fusion Enhanced Collaborative Visual Perception

Add code
Oct 09, 2025
Figure 1 for RayFusion: Ray Fusion Enhanced Collaborative Visual Perception
Figure 2 for RayFusion: Ray Fusion Enhanced Collaborative Visual Perception
Figure 3 for RayFusion: Ray Fusion Enhanced Collaborative Visual Perception
Figure 4 for RayFusion: Ray Fusion Enhanced Collaborative Visual Perception
Viaarxiv icon

Ming-Omni: A Unified Multimodal Model for Perception and Generation

Add code
Jun 11, 2025
Figure 1 for Ming-Omni: A Unified Multimodal Model for Perception and Generation
Figure 2 for Ming-Omni: A Unified Multimodal Model for Perception and Generation
Figure 3 for Ming-Omni: A Unified Multimodal Model for Perception and Generation
Figure 4 for Ming-Omni: A Unified Multimodal Model for Perception and Generation
Viaarxiv icon

Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction

Add code
May 05, 2025
Figure 1 for Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Figure 2 for Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Figure 3 for Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Figure 4 for Ming-Lite-Uni: Advancements in Unified Architecture for Natural Multimodal Interaction
Viaarxiv icon

Merge then Realign: Simple and Effective Modality-Incremental Continual Learning for Multimodal LLMs

Add code
Mar 08, 2025
Viaarxiv icon

IFTR: An Instance-Level Fusion Transformer for Visual Collaborative Perception

Add code
Jul 13, 2024
Figure 1 for IFTR: An Instance-Level Fusion Transformer for Visual Collaborative Perception
Figure 2 for IFTR: An Instance-Level Fusion Transformer for Visual Collaborative Perception
Figure 3 for IFTR: An Instance-Level Fusion Transformer for Visual Collaborative Perception
Figure 4 for IFTR: An Instance-Level Fusion Transformer for Visual Collaborative Perception
Viaarxiv icon