Picture for Shentong Mo

Shentong Mo

Continual Audio-Visual Sound Separation

Add code
Nov 05, 2024
Figure 1 for Continual Audio-Visual Sound Separation
Figure 2 for Continual Audio-Visual Sound Separation
Figure 3 for Continual Audio-Visual Sound Separation
Figure 4 for Continual Audio-Visual Sound Separation
Viaarxiv icon

Aligning Audio-Visual Joint Representations with an Agentic Workflow

Add code
Oct 31, 2024
Viaarxiv icon

Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning

Add code
Oct 25, 2024
Figure 1 for Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning
Figure 2 for Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning
Figure 3 for Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning
Figure 4 for Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning
Viaarxiv icon

Rethinking Positive Pairs in Contrastive Learning

Add code
Oct 23, 2024
Figure 1 for Rethinking Positive Pairs in Contrastive Learning
Figure 2 for Rethinking Positive Pairs in Contrastive Learning
Figure 3 for Rethinking Positive Pairs in Contrastive Learning
Figure 4 for Rethinking Positive Pairs in Contrastive Learning
Viaarxiv icon

Multi-scale Multi-instance Visual Sound Localization and Segmentation

Add code
Aug 31, 2024
Viaarxiv icon

MultiMed: Massively Multimodal and Multitask Medical Understanding

Add code
Aug 22, 2024
Viaarxiv icon

IoT-LM: Large Multisensory Language Models for the Internet of Things

Add code
Jul 13, 2024
Figure 1 for IoT-LM: Large Multisensory Language Models for the Internet of Things
Figure 2 for IoT-LM: Large Multisensory Language Models for the Internet of Things
Figure 3 for IoT-LM: Large Multisensory Language Models for the Internet of Things
Figure 4 for IoT-LM: Large Multisensory Language Models for the Internet of Things
Viaarxiv icon

Semantic Grouping Network for Audio Source Separation

Add code
Jul 04, 2024
Figure 1 for Semantic Grouping Network for Audio Source Separation
Figure 2 for Semantic Grouping Network for Audio Source Separation
Figure 3 for Semantic Grouping Network for Audio Source Separation
Figure 4 for Semantic Grouping Network for Audio Source Separation
Viaarxiv icon

MA-AVT: Modality Alignment for Parameter-Efficient Audio-Visual Transformers

Add code
Jun 07, 2024
Viaarxiv icon

Efficient 3D Shape Generation via Diffusion Mamba with Bidirectional SSMs

Add code
Jun 07, 2024
Viaarxiv icon