Picture for Xilai Li

Xilai Li

SpeechVerse: A Large-scale Generalizable Audio Language Model

Add code
May 14, 2024
Figure 1 for SpeechVerse: A Large-scale Generalizable Audio Language Model
Figure 2 for SpeechVerse: A Large-scale Generalizable Audio Language Model
Figure 3 for SpeechVerse: A Large-scale Generalizable Audio Language Model
Figure 4 for SpeechVerse: A Large-scale Generalizable Audio Language Model
Viaarxiv icon

MMA-UNet: A Multi-Modal Asymmetric UNet Architecture for Infrared and Visible Image Fusion

Add code
Apr 27, 2024
Figure 1 for MMA-UNet: A Multi-Modal Asymmetric UNet Architecture for Infrared and Visible Image Fusion
Figure 2 for MMA-UNet: A Multi-Modal Asymmetric UNet Architecture for Infrared and Visible Image Fusion
Figure 3 for MMA-UNet: A Multi-Modal Asymmetric UNet Architecture for Infrared and Visible Image Fusion
Figure 4 for MMA-UNet: A Multi-Modal Asymmetric UNet Architecture for Infrared and Visible Image Fusion
Viaarxiv icon

Decomposition-based and Interference Perception for Infrared and Visible Image Fusion in Complex Scenes

Add code
Feb 03, 2024
Viaarxiv icon

Physical Perception Network and an All-weather Multi-modality Benchmark for Adverse Weather Image Fusion

Add code
Feb 03, 2024
Viaarxiv icon

SAMF: Small-Area-Aware Multi-focus Image Fusion for Object Detection

Add code
Jan 31, 2024
Viaarxiv icon

Bridging the Gap between Multi-focus and Multi-modal: A Focused Integration Framework for Multi-modal Image Fusion

Add code
Nov 03, 2023
Figure 1 for Bridging the Gap between Multi-focus and Multi-modal: A Focused Integration Framework for Multi-modal Image Fusion
Figure 2 for Bridging the Gap between Multi-focus and Multi-modal: A Focused Integration Framework for Multi-modal Image Fusion
Figure 3 for Bridging the Gap between Multi-focus and Multi-modal: A Focused Integration Framework for Multi-modal Image Fusion
Figure 4 for Bridging the Gap between Multi-focus and Multi-modal: A Focused Integration Framework for Multi-modal Image Fusion
Viaarxiv icon

DCTX-Conformer: Dynamic context carry-over for low latency unified streaming and non-streaming Conformer

Add code
Jun 13, 2023
Viaarxiv icon

Masked Audio Text Encoders are Effective Multi-Modal Rescorers

Add code
May 24, 2023
Viaarxiv icon

Dynamic Chunk Convolution for Unified Streaming and Non-Streaming Conformer ASR

Add code
Apr 25, 2023
Viaarxiv icon

Attentive Normalization

Add code
Aug 04, 2019
Figure 1 for Attentive Normalization
Figure 2 for Attentive Normalization
Figure 3 for Attentive Normalization
Figure 4 for Attentive Normalization
Viaarxiv icon