Picture for Jun Du

Jun Du

Skeleton and Font Generation Network for Zero-shot Chinese Character Generation

Add code
Jan 14, 2025
Viaarxiv icon

Learned Data Compression: Challenges and Opportunities for the Future

Add code
Dec 14, 2024
Viaarxiv icon

RFL: Simplifying Chemical Structure Recognition with Ring-Free Language

Add code
Dec 10, 2024
Figure 1 for RFL: Simplifying Chemical Structure Recognition with Ring-Free Language
Figure 2 for RFL: Simplifying Chemical Structure Recognition with Ring-Free Language
Figure 3 for RFL: Simplifying Chemical Structure Recognition with Ring-Free Language
Figure 4 for RFL: Simplifying Chemical Structure Recognition with Ring-Free Language
Viaarxiv icon

Joint Optimization of Communication Enhancement and Location Privacy Protection in RIS-Assisted Underwater Communication System

Add code
Nov 30, 2024
Viaarxiv icon

EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion

Add code
Nov 23, 2024
Figure 1 for EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion
Figure 2 for EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion
Figure 3 for EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion
Figure 4 for EmotiveTalk: Expressive Talking Head Generation through Audio Information Decoupling and Emotional Video Diffusion
Viaarxiv icon

MVANet: Multi-Stage Video Attention Network for Sound Event Localization and Detection with Source Distance Estimation

Add code
Nov 21, 2024
Viaarxiv icon

DCF-DS: Deep Cascade Fusion of Diarization and Separation for Speech Recognition under Realistic Single-Channel Conditions

Add code
Nov 11, 2024
Figure 1 for DCF-DS: Deep Cascade Fusion of Diarization and Separation for Speech Recognition under Realistic Single-Channel Conditions
Figure 2 for DCF-DS: Deep Cascade Fusion of Diarization and Separation for Speech Recognition under Realistic Single-Channel Conditions
Figure 3 for DCF-DS: Deep Cascade Fusion of Diarization and Separation for Speech Recognition under Realistic Single-Channel Conditions
Figure 4 for DCF-DS: Deep Cascade Fusion of Diarization and Separation for Speech Recognition under Realistic Single-Channel Conditions
Viaarxiv icon

Enhancing Multimodal Sentiment Analysis for Missing Modality through Self-Distillation and Unified Modality Cross-Attention

Add code
Oct 19, 2024
Figure 1 for Enhancing Multimodal Sentiment Analysis for Missing Modality through Self-Distillation and Unified Modality Cross-Attention
Figure 2 for Enhancing Multimodal Sentiment Analysis for Missing Modality through Self-Distillation and Unified Modality Cross-Attention
Figure 3 for Enhancing Multimodal Sentiment Analysis for Missing Modality through Self-Distillation and Unified Modality Cross-Attention
Figure 4 for Enhancing Multimodal Sentiment Analysis for Missing Modality through Self-Distillation and Unified Modality Cross-Attention
Viaarxiv icon

DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation

Add code
Oct 17, 2024
Figure 1 for DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation
Figure 2 for DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation
Figure 3 for DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation
Figure 4 for DAWN: Dynamic Frame Avatar with Non-autoregressive Diffusion Framework for Talking Head Video Generation
Viaarxiv icon

See then Tell: Enhancing Key Information Extraction with Vision Grounding

Add code
Sep 29, 2024
Figure 1 for See then Tell: Enhancing Key Information Extraction with Vision Grounding
Figure 2 for See then Tell: Enhancing Key Information Extraction with Vision Grounding
Figure 3 for See then Tell: Enhancing Key Information Extraction with Vision Grounding
Figure 4 for See then Tell: Enhancing Key Information Extraction with Vision Grounding
Viaarxiv icon