Picture for Jun Du

Jun Du

RFL: Simplifying Chemical Structure Recognition with Ring-Free Language

Add code
Dec 10, 2024
Viaarxiv icon

Joint Optimization of Communication Enhancement and Location Privacy Protection in RIS-Assisted Underwater Communication System

Add code
Nov 30, 2024
Viaarxiv icon

EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion

Add code
Nov 23, 2024
Figure 1 for EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion
Figure 2 for EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion
Figure 3 for EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion
Figure 4 for EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion
Viaarxiv icon

MVANet: Multi-Stage Video Attention Network for Sound Event Localization and Detection with Source Distance Estimation

Add code
Nov 21, 2024
Viaarxiv icon

DCF-DS: Deep Cascade Fusion of Diarization and Separation for Speech Recognition under Realistic Single-Channel Conditions

Add code
Nov 11, 2024
Viaarxiv icon

Enhancing Multimodal Sentiment Analysis for Missing Modality through Self-Distillation and Unified Modality Cross-Attention

Add code
Oct 19, 2024
Figure 1 for Enhancing Multimodal Sentiment Analysis for Missing Modality through Self-Distillation and Unified Modality Cross-Attention
Figure 2 for Enhancing Multimodal Sentiment Analysis for Missing Modality through Self-Distillation and Unified Modality Cross-Attention
Figure 3 for Enhancing Multimodal Sentiment Analysis for Missing Modality through Self-Distillation and Unified Modality Cross-Attention
Figure 4 for Enhancing Multimodal Sentiment Analysis for Missing Modality through Self-Distillation and Unified Modality Cross-Attention
Viaarxiv icon

DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation

Add code
Oct 17, 2024
Figure 1 for DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation
Figure 2 for DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation
Figure 3 for DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation
Figure 4 for DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation
Viaarxiv icon

See then Tell: Enhancing Key Information Extraction with Vision Grounding

Add code
Sep 29, 2024
Figure 1 for See then Tell: Enhancing Key Information Extraction with Vision Grounding
Figure 2 for See then Tell: Enhancing Key Information Extraction with Vision Grounding
Figure 3 for See then Tell: Enhancing Key Information Extraction with Vision Grounding
Figure 4 for See then Tell: Enhancing Key Information Extraction with Vision Grounding
Viaarxiv icon

Incorporating Spatial Cues in Modular Speaker Diarization for Multi-channel Multi-party Meetings

Add code
Sep 25, 2024
Figure 1 for Incorporating Spatial Cues in Modular Speaker Diarization for Multi-channel Multi-party Meetings
Figure 2 for Incorporating Spatial Cues in Modular Speaker Diarization for Multi-channel Multi-party Meetings
Figure 3 for Incorporating Spatial Cues in Modular Speaker Diarization for Multi-channel Multi-party Meetings
Figure 4 for Incorporating Spatial Cues in Modular Speaker Diarization for Multi-channel Multi-party Meetings
Viaarxiv icon

DocMamba: Efficient Document Pre-training with State Space Model

Add code
Sep 18, 2024
Viaarxiv icon