Picture for Yan-Bo Lin

Yan-Bo Lin

VMAS: Video-to-Music Generation via Semantic Alignment in Web Music Videos

Add code
Sep 11, 2024
Viaarxiv icon

Siamese Vision Transformers are Scalable Audio-visual Learners

Add code
Mar 28, 2024
Viaarxiv icon

DAM: Dynamic Adapter Merging for Continual Video QA Learning

Add code
Mar 13, 2024
Viaarxiv icon

Vision Transformers are Parameter-Efficient Audio-Visual Learners

Add code
Dec 15, 2022
Viaarxiv icon

ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound

Add code
Apr 06, 2022
Figure 1 for ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
Figure 2 for ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
Figure 3 for ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
Figure 4 for ECLIPSE: Efficient Long-range Video Retrieval using Sight and Sound
Viaarxiv icon

Exploiting Audio-Visual Consistency with Partial Supervision for Spatial Audio Generation

Add code
May 03, 2021
Figure 1 for Exploiting Audio-Visual Consistency with Partial Supervision for Spatial Audio Generation
Figure 2 for Exploiting Audio-Visual Consistency with Partial Supervision for Spatial Audio Generation
Figure 3 for Exploiting Audio-Visual Consistency with Partial Supervision for Spatial Audio Generation
Figure 4 for Exploiting Audio-Visual Consistency with Partial Supervision for Spatial Audio Generation
Viaarxiv icon

Unsupervised Sound Localization via Iterative Contrastive Learning

Add code
Apr 01, 2021
Figure 1 for Unsupervised Sound Localization via Iterative Contrastive Learning
Figure 2 for Unsupervised Sound Localization via Iterative Contrastive Learning
Figure 3 for Unsupervised Sound Localization via Iterative Contrastive Learning
Figure 4 for Unsupervised Sound Localization via Iterative Contrastive Learning
Viaarxiv icon

Cross-Dataset Person Re-Identification via Unsupervised Pose Disentanglement and Adaptation

Add code
Sep 20, 2019
Figure 1 for Cross-Dataset Person Re-Identification via Unsupervised Pose Disentanglement and Adaptation
Figure 2 for Cross-Dataset Person Re-Identification via Unsupervised Pose Disentanglement and Adaptation
Figure 3 for Cross-Dataset Person Re-Identification via Unsupervised Pose Disentanglement and Adaptation
Figure 4 for Cross-Dataset Person Re-Identification via Unsupervised Pose Disentanglement and Adaptation
Viaarxiv icon

Dual-modality seq2seq network for audio-visual event localization

Add code
Feb 20, 2019
Figure 1 for Dual-modality seq2seq network for audio-visual event localization
Figure 2 for Dual-modality seq2seq network for audio-visual event localization
Figure 3 for Dual-modality seq2seq network for audio-visual event localization
Figure 4 for Dual-modality seq2seq network for audio-visual event localization
Viaarxiv icon