Picture for Sihan Chen

Sihan Chen

RITA: A Real-time Interactive Talking Avatars Framework

Add code
Jun 18, 2024
Viaarxiv icon

Fuse & Calibrate: A bi-directional Vision-Language Guided Framework for Referring Image Segmentation

Add code
May 18, 2024
Viaarxiv icon

NTIRE 2024 Challenge on Image Super-Resolution ($\times$4): Methods and Results

Add code
Apr 15, 2024
Figure 1 for NTIRE 2024 Challenge on Image Super-Resolution ($\times$4): Methods and Results
Figure 2 for NTIRE 2024 Challenge on Image Super-Resolution ($\times$4): Methods and Results
Figure 3 for NTIRE 2024 Challenge on Image Super-Resolution ($\times$4): Methods and Results
Figure 4 for NTIRE 2024 Challenge on Image Super-Resolution ($\times$4): Methods and Results
Viaarxiv icon

Calibration & Reconstruction: Deep Integrated Language for Referring Image Segmentation

Add code
Apr 12, 2024
Viaarxiv icon

VL-Mamba: Exploring State Space Models for Multimodal Learning

Add code
Mar 20, 2024
Figure 1 for VL-Mamba: Exploring State Space Models for Multimodal Learning
Figure 2 for VL-Mamba: Exploring State Space Models for Multimodal Learning
Figure 3 for VL-Mamba: Exploring State Space Models for Multimodal Learning
Figure 4 for VL-Mamba: Exploring State Space Models for Multimodal Learning
Viaarxiv icon

Semantic Entropy Can Simultaneously Benefit Transmission Efficiency and Channel Security of Wireless Semantic Communications

Add code
Feb 07, 2024
Viaarxiv icon

GLOBER: Coherent Non-autoregressive Video Generation via GLOBal Guided Video DecodER

Add code
Sep 23, 2023
Viaarxiv icon

EAVL: Explicitly Align Vision and Language for Referring Image Segmentation

Add code
Aug 22, 2023
Viaarxiv icon

COSA: Concatenated Sample Pretrained Vision-Language Foundation Model

Add code
Jun 15, 2023
Viaarxiv icon

VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset

Add code
May 29, 2023
Figure 1 for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Figure 2 for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Figure 3 for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Figure 4 for VAST: A Vision-Audio-Subtitle-Text Omni-Modality Foundation Model and Dataset
Viaarxiv icon