Picture for Yuxin Guo

Yuxin Guo

CVD-STORM: Cross-View Video Diffusion with Spatial-Temporal Reconstruction Model for Autonomous Driving

Add code
Oct 09, 2025
Viaarxiv icon

AudioStory: Generating Long-Form Narrative Audio with Large Language Models

Add code
Aug 27, 2025
Viaarxiv icon

ReasonPlan: Unified Scene Prediction and Decision Reasoning for Closed-loop Autonomous Driving

Add code
May 26, 2025
Viaarxiv icon

Parallel Layer Normalization for Universal Approximation

Add code
May 19, 2025
Viaarxiv icon

Aligned Better, Listen Better for Audio-Visual Large Language Models

Add code
Apr 02, 2025
Viaarxiv icon

GenHancer: Imperfect Generative Models are Secretly Strong Vision-Centric Enhancers

Add code
Mar 25, 2025
Viaarxiv icon

Monocular Depth Estimation and Segmentation for Transparent Object with Iterative Semantic and Geometric Fusion

Add code
Feb 20, 2025
Viaarxiv icon

MaskGWM: A Generalizable Driving World Model with Video Mask Reconstruction

Add code
Feb 17, 2025
Viaarxiv icon

UniMLVG: Unified Framework for Multi-view Long Video Generation with Comprehensive Control Capabilities for Autonomous Driving

Add code
Dec 06, 2024
Viaarxiv icon

HoloDrive: Holistic 2D-3D Multi-Modal Street Scene Generation for Autonomous Driving

Add code
Dec 03, 2024
Figure 1 for HoloDrive: Holistic 2D-3D Multi-Modal Street Scene Generation for Autonomous Driving
Figure 2 for HoloDrive: Holistic 2D-3D Multi-Modal Street Scene Generation for Autonomous Driving
Figure 3 for HoloDrive: Holistic 2D-3D Multi-Modal Street Scene Generation for Autonomous Driving
Figure 4 for HoloDrive: Holistic 2D-3D Multi-Modal Street Scene Generation for Autonomous Driving
Viaarxiv icon