Picture for Yusheng Dai

Yusheng Dai

Latent Swap Joint Diffusion for Long-Form Audio Generation

Add code
Feb 07, 2025
Figure 1 for Latent Swap Joint Diffusion for Long-Form Audio Generation
Figure 2 for Latent Swap Joint Diffusion for Long-Form Audio Generation
Figure 3 for Latent Swap Joint Diffusion for Long-Form Audio Generation
Figure 4 for Latent Swap Joint Diffusion for Long-Form Audio Generation
Viaarxiv icon

A Study of Dropout-Induced Modality Bias on Robustness to Missing Video Frames for Audio-Visual Speech Recognition

Add code
Mar 07, 2024
Viaarxiv icon

The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction

Add code
Sep 15, 2023
Figure 1 for The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction
Figure 2 for The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction
Figure 3 for The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction
Figure 4 for The Multimodal Information Based Speech Processing (MISP) 2023 Challenge: Audio-Visual Target Speaker Extraction
Viaarxiv icon

Improving Audio-Visual Speech Recognition by Lip-Subword Correlation Based Visual Pre-training and Cross-Modal Fusion Encoder

Add code
Aug 14, 2023
Viaarxiv icon