Picture for Jun Du

Jun Du

Enhancing Multimodal Sentiment Analysis for Missing Modality through Self-Distillation and Unified Modality Cross-Attention

Add code
Oct 19, 2024
Figure 1 for Enhancing Multimodal Sentiment Analysis for Missing Modality through Self-Distillation and Unified Modality Cross-Attention
Figure 2 for Enhancing Multimodal Sentiment Analysis for Missing Modality through Self-Distillation and Unified Modality Cross-Attention
Figure 3 for Enhancing Multimodal Sentiment Analysis for Missing Modality through Self-Distillation and Unified Modality Cross-Attention
Figure 4 for Enhancing Multimodal Sentiment Analysis for Missing Modality through Self-Distillation and Unified Modality Cross-Attention
Viaarxiv icon

DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation

Add code
Oct 17, 2024
Figure 1 for DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation
Figure 2 for DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation
Figure 3 for DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation
Figure 4 for DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation
Viaarxiv icon

See then Tell: Enhancing Key Information Extraction with Vision Grounding

Add code
Sep 29, 2024
Figure 1 for See then Tell: Enhancing Key Information Extraction with Vision Grounding
Figure 2 for See then Tell: Enhancing Key Information Extraction with Vision Grounding
Figure 3 for See then Tell: Enhancing Key Information Extraction with Vision Grounding
Figure 4 for See then Tell: Enhancing Key Information Extraction with Vision Grounding
Viaarxiv icon

Incorporating Spatial Cues in Modular Speaker Diarization for Multi-channel Multi-party Meetings

Add code
Sep 25, 2024
Figure 1 for Incorporating Spatial Cues in Modular Speaker Diarization for Multi-channel Multi-party Meetings
Figure 2 for Incorporating Spatial Cues in Modular Speaker Diarization for Multi-channel Multi-party Meetings
Figure 3 for Incorporating Spatial Cues in Modular Speaker Diarization for Multi-channel Multi-party Meetings
Figure 4 for Incorporating Spatial Cues in Modular Speaker Diarization for Multi-channel Multi-party Meetings
Viaarxiv icon

DocMamba: Efficient Document Pre-training with State Space Model

Add code
Sep 18, 2024
Viaarxiv icon

Findings of the 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge

Add code
Sep 09, 2024
Figure 1 for Findings of the 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge
Figure 2 for Findings of the 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge
Figure 3 for Findings of the 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge
Figure 4 for Findings of the 2024 Mandarin Stuttering Event Detection and Automatic Speech Recognition Challenge
Viaarxiv icon

The USTC-NERCSLIP Systems for the CHiME-8 NOTSOFAR-1 Challenge

Add code
Sep 03, 2024
Figure 1 for The USTC-NERCSLIP Systems for the CHiME-8 NOTSOFAR-1 Challenge
Figure 2 for The USTC-NERCSLIP Systems for the CHiME-8 NOTSOFAR-1 Challenge
Figure 3 for The USTC-NERCSLIP Systems for the CHiME-8 NOTSOFAR-1 Challenge
Figure 4 for The USTC-NERCSLIP Systems for the CHiME-8 NOTSOFAR-1 Challenge
Viaarxiv icon

Topological GCN for Improving Detection of Hip Landmarks from B-Mode Ultrasound Images

Add code
Aug 24, 2024
Figure 1 for Topological GCN for Improving Detection of Hip Landmarks from B-Mode Ultrasound Images
Figure 2 for Topological GCN for Improving Detection of Hip Landmarks from B-Mode Ultrasound Images
Figure 3 for Topological GCN for Improving Detection of Hip Landmarks from B-Mode Ultrasound Images
Figure 4 for Topological GCN for Improving Detection of Hip Landmarks from B-Mode Ultrasound Images
Viaarxiv icon

NAMER: Non-Autoregressive Modeling for Handwritten Mathematical Expression Recognition

Add code
Jul 16, 2024
Viaarxiv icon

Exploring Audio-Visual Information Fusion for Sound Event Localization and Detection In Low-Resource Realistic Scenarios

Add code
Jun 21, 2024
Viaarxiv icon