Picture for Junwen Xiong

Junwen Xiong

DiffSal: Joint Audio and Video Learning for Diffusion Saliency Prediction

Add code
Mar 02, 2024
Figure 1 for DiffSal: Joint Audio and Video Learning for Diffusion Saliency Prediction
Figure 2 for DiffSal: Joint Audio and Video Learning for Diffusion Saliency Prediction
Figure 3 for DiffSal: Joint Audio and Video Learning for Diffusion Saliency Prediction
Figure 4 for DiffSal: Joint Audio and Video Learning for Diffusion Saliency Prediction
Viaarxiv icon

UniST: Towards Unifying Saliency Transformer for Video Saliency Prediction and Detection

Add code
Sep 15, 2023
Viaarxiv icon

FTFDNet: Learning to Detect Talking Face Video Manipulation with Tri-Modality Interaction

Add code
Jul 08, 2023
Viaarxiv icon

CASP-Net: Rethinking Video Saliency Prediction from an Audio-VisualConsistency Perceptual Perspective

Add code
Mar 11, 2023
Figure 1 for CASP-Net: Rethinking Video Saliency Prediction from an Audio-VisualConsistency Perceptual Perspective
Figure 2 for CASP-Net: Rethinking Video Saliency Prediction from an Audio-VisualConsistency Perceptual Perspective
Figure 3 for CASP-Net: Rethinking Video Saliency Prediction from an Audio-VisualConsistency Perceptual Perspective
Figure 4 for CASP-Net: Rethinking Video Saliency Prediction from an Audio-VisualConsistency Perceptual Perspective
Viaarxiv icon

Audio-visual speech separation based on joint feature representation with cross-modal attention

Add code
Mar 05, 2022
Figure 1 for Audio-visual speech separation based on joint feature representation with cross-modal attention
Figure 2 for Audio-visual speech separation based on joint feature representation with cross-modal attention
Figure 3 for Audio-visual speech separation based on joint feature representation with cross-modal attention
Figure 4 for Audio-visual speech separation based on joint feature representation with cross-modal attention
Viaarxiv icon

Look\&Listen: Multi-Modal Correlation Learning for Active Speaker Detection and Speech Enhancement

Add code
Mar 04, 2022
Figure 1 for Look\&Listen: Multi-Modal Correlation Learning for Active Speaker Detection and Speech Enhancement
Figure 2 for Look\&Listen: Multi-Modal Correlation Learning for Active Speaker Detection and Speech Enhancement
Figure 3 for Look\&Listen: Multi-Modal Correlation Learning for Active Speaker Detection and Speech Enhancement
Figure 4 for Look\&Listen: Multi-Modal Correlation Learning for Active Speaker Detection and Speech Enhancement
Viaarxiv icon