Picture for Shentong Mo

Shentong Mo

The Dynamic Duo of Collaborative Masking and Target for Advanced Masked Autoencoder Learning

Add code
Dec 23, 2024
Viaarxiv icon

Modality-Inconsistent Continual Learning of Multimodal Large Language Models

Add code
Dec 17, 2024
Figure 1 for Modality-Inconsistent Continual Learning of Multimodal Large Language Models
Figure 2 for Modality-Inconsistent Continual Learning of Multimodal Large Language Models
Figure 3 for Modality-Inconsistent Continual Learning of Multimodal Large Language Models
Figure 4 for Modality-Inconsistent Continual Learning of Multimodal Large Language Models
Viaarxiv icon

Continual Audio-Visual Sound Separation

Add code
Nov 05, 2024
Figure 1 for Continual Audio-Visual Sound Separation
Figure 2 for Continual Audio-Visual Sound Separation
Figure 3 for Continual Audio-Visual Sound Separation
Figure 4 for Continual Audio-Visual Sound Separation
Viaarxiv icon

Aligning Audio-Visual Joint Representations with an Agentic Workflow

Add code
Oct 31, 2024
Viaarxiv icon

Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning

Add code
Oct 25, 2024
Figure 1 for Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning
Figure 2 for Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning
Figure 3 for Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning
Figure 4 for Connecting Joint-Embedding Predictive Architecture with Contrastive Self-supervised Learning
Viaarxiv icon

Rethinking Positive Pairs in Contrastive Learning

Add code
Oct 23, 2024
Figure 1 for Rethinking Positive Pairs in Contrastive Learning
Figure 2 for Rethinking Positive Pairs in Contrastive Learning
Figure 3 for Rethinking Positive Pairs in Contrastive Learning
Figure 4 for Rethinking Positive Pairs in Contrastive Learning
Viaarxiv icon

Multi-scale Multi-instance Visual Sound Localization and Segmentation

Add code
Aug 31, 2024
Viaarxiv icon

MultiMed: Massively Multimodal and Multitask Medical Understanding

Add code
Aug 22, 2024
Viaarxiv icon

IoT-LM: Large Multisensory Language Models for the Internet of Things

Add code
Jul 13, 2024
Figure 1 for IoT-LM: Large Multisensory Language Models for the Internet of Things
Figure 2 for IoT-LM: Large Multisensory Language Models for the Internet of Things
Figure 3 for IoT-LM: Large Multisensory Language Models for the Internet of Things
Figure 4 for IoT-LM: Large Multisensory Language Models for the Internet of Things
Viaarxiv icon

Semantic Grouping Network for Audio Source Separation

Add code
Jul 04, 2024
Figure 1 for Semantic Grouping Network for Audio Source Separation
Figure 2 for Semantic Grouping Network for Audio Source Separation
Figure 3 for Semantic Grouping Network for Audio Source Separation
Figure 4 for Semantic Grouping Network for Audio Source Separation
Viaarxiv icon