Picture for Shentong Mo

Shentong Mo

Continual Audio-Visual Sound Separation

Add code
Nov 05, 2024
Viaarxiv icon

Aligning Audio-Visual Joint Representations with an Agentic Workflow

Add code
Oct 31, 2024
Viaarxiv icon

Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning

Add code
Oct 25, 2024
Figure 1 for Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning
Figure 2 for Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning
Figure 3 for Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning
Figure 4 for Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning
Viaarxiv icon

Rethinking Positive Pairs in Contrastive Learning

Add code
Oct 23, 2024
Figure 1 for Rethinking Positive Pairs in Contrastive Learning
Figure 2 for Rethinking Positive Pairs in Contrastive Learning
Figure 3 for Rethinking Positive Pairs in Contrastive Learning
Figure 4 for Rethinking Positive Pairs in Contrastive Learning
Viaarxiv icon

Multi-scale Multi-instance Visual Sound Localization and Segmentation

Add code
Aug 31, 2024
Viaarxiv icon

MultiMed: Massively Multimodal and Multitask Medical Understanding

Add code
Aug 22, 2024
Viaarxiv icon

IoT-LM: Large Multisensory Language Models for the Internet of Things

Add code
Jul 13, 2024
Viaarxiv icon

Semantic Grouping Network for Audio Source Separation

Add code
Jul 04, 2024
Viaarxiv icon

Efficient 3D Shape Generation via Diffusion Mamba with Bidirectional SSMs

Add code
Jun 07, 2024
Viaarxiv icon

MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual Transformers

Add code
Jun 07, 2024
Viaarxiv icon