Picture for Xingyu Zhang

Xingyu Zhang

GoalFlow: Goal-Driven Flow Matching for Multimodal Trajectories Generation in End-to-End Autonomous Driving

Add code
Mar 07, 2025
Viaarxiv icon

Don't Shake the Wheel: Momentum-Aware Planning in End-to-End Autonomous Driving

Add code
Mar 05, 2025
Viaarxiv icon

AVE Speech Dataset: A Comprehensive Benchmark for Multi-Modal Speech Recognition Integrating Audio, Visual, and Electromyographic Signals

Add code
Jan 28, 2025
Figure 1 for AVE Speech Dataset: A Comprehensive Benchmark for Multi-Modal Speech Recognition Integrating Audio, Visual, and Electromyographic Signals
Figure 2 for AVE Speech Dataset: A Comprehensive Benchmark for Multi-Modal Speech Recognition Integrating Audio, Visual, and Electromyographic Signals
Figure 3 for AVE Speech Dataset: A Comprehensive Benchmark for Multi-Modal Speech Recognition Integrating Audio, Visual, and Electromyographic Signals
Figure 4 for AVE Speech Dataset: A Comprehensive Benchmark for Multi-Modal Speech Recognition Integrating Audio, Visual, and Electromyographic Signals
Viaarxiv icon

LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition

Add code
Jan 08, 2025
Figure 1 for LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition
Figure 2 for LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition
Figure 3 for LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition
Figure 4 for LipGen: Viseme-Guided Lip Video Generation for Enhancing Visual Speech Recognition
Viaarxiv icon

Optimized Coordination Strategy for Multi-Aerospace Systems in Pick-and-Place Tasks By Deep Neural Network

Add code
Dec 13, 2024
Viaarxiv icon

EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation

Add code
Nov 15, 2024
Figure 1 for EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
Figure 2 for EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
Figure 3 for EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
Figure 4 for EchoMimicV2: Towards Striking, Simplified, and Semi-Body Human Animation
Viaarxiv icon

Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving

Add code
Oct 29, 2024
Figure 1 for Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving
Figure 2 for Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving
Figure 3 for Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving
Figure 4 for Senna: Bridging Large Vision-Language Models and End-to-End Autonomous Driving
Viaarxiv icon

HE-Drive: Human-Like End-to-End Driving with Vision Language Models

Add code
Oct 07, 2024
Viaarxiv icon

OccRWKV: Rethinking Efficient 3D Semantic Occupancy Prediction with Linear Complexity

Add code
Sep 30, 2024
Figure 1 for OccRWKV: Rethinking Efficient 3D Semantic Occupancy Prediction with Linear Complexity
Figure 2 for OccRWKV: Rethinking Efficient 3D Semantic Occupancy Prediction with Linear Complexity
Figure 3 for OccRWKV: Rethinking Efficient 3D Semantic Occupancy Prediction with Linear Complexity
Figure 4 for OccRWKV: Rethinking Efficient 3D Semantic Occupancy Prediction with Linear Complexity
Viaarxiv icon

DRAL: Deep Reinforcement Adaptive Learning for Multi-UAVs Navigation in Unknown Indoor Environment

Add code
Sep 05, 2024
Figure 1 for DRAL: Deep Reinforcement Adaptive Learning for Multi-UAVs Navigation in Unknown Indoor Environment
Figure 2 for DRAL: Deep Reinforcement Adaptive Learning for Multi-UAVs Navigation in Unknown Indoor Environment
Figure 3 for DRAL: Deep Reinforcement Adaptive Learning for Multi-UAVs Navigation in Unknown Indoor Environment
Figure 4 for DRAL: Deep Reinforcement Adaptive Learning for Multi-UAVs Navigation in Unknown Indoor Environment
Viaarxiv icon