Picture for Shijian Deng

Shijian Deng

Modality-Inconsistent Continual Learning of Multimodal Large Language Models

Add code
Dec 17, 2024
Figure 1 for Modality-Inconsistent Continual Learning of Multimodal Large Language Models
Figure 2 for Modality-Inconsistent Continual Learning of Multimodal Large Language Models
Figure 3 for Modality-Inconsistent Continual Learning of Multimodal Large Language Models
Figure 4 for Modality-Inconsistent Continual Learning of Multimodal Large Language Models
Viaarxiv icon

Efficient Self-Improvement in Multimodal Large Language Models: A Model-Level Judge-Free Approach

Add code
Nov 26, 2024
Figure 1 for Efficient Self-Improvement in Multimodal Large Language Models: A Model-Level Judge-Free Approach
Figure 2 for Efficient Self-Improvement in Multimodal Large Language Models: A Model-Level Judge-Free Approach
Figure 3 for Efficient Self-Improvement in Multimodal Large Language Models: A Model-Level Judge-Free Approach
Figure 4 for Efficient Self-Improvement in Multimodal Large Language Models: A Model-Level Judge-Free Approach
Viaarxiv icon

Continual Audio-Visual Sound Separation

Add code
Nov 05, 2024
Figure 1 for Continual Audio-Visual Sound Separation
Figure 2 for Continual Audio-Visual Sound Separation
Figure 3 for Continual Audio-Visual Sound Separation
Figure 4 for Continual Audio-Visual Sound Separation
Viaarxiv icon

AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation

Add code
Jun 11, 2024
Figure 1 for AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation
Figure 2 for AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation
Figure 3 for AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation
Figure 4 for AV-DiT: Efficient Audio-Visual Diffusion Transformer for Joint Audio and Video Generation
Viaarxiv icon

Separating Invisible Sounds Toward Universal Audiovisual Scene-Aware Sound Separation

Add code
Oct 18, 2023
Figure 1 for Separating Invisible Sounds Toward Universal Audiovisual Scene-Aware Sound Separation
Figure 2 for Separating Invisible Sounds Toward Universal Audiovisual Scene-Aware Sound Separation
Figure 3 for Separating Invisible Sounds Toward Universal Audiovisual Scene-Aware Sound Separation
Figure 4 for Separating Invisible Sounds Toward Universal Audiovisual Scene-Aware Sound Separation
Viaarxiv icon

Unveiling Cross Modality Bias in Visual Question Answering: A Causal View with Possible Worlds VQA

Add code
May 31, 2023
Viaarxiv icon