Picture for Chang Sun

Chang Sun

VALLR-Pin: Uncertainty-Factorized Visual Speech Recognition for Mandarin with Pinyin Guidance

Add code
Dec 29, 2025
Viaarxiv icon

VALLR-Pin: Dual-Decoding Visual Speech Recognition for Mandarin with Pinyin-Guided LLM Refinement

Add code
Dec 23, 2025
Viaarxiv icon

AIE4ML: An End-to-End Framework for Compiling Neural Networks for the Next Generation of AMD AI Engines

Add code
Dec 17, 2025
Viaarxiv icon

RINO: Renormalization Group Invariance with No Labels

Add code
Sep 10, 2025
Viaarxiv icon

Bilateral Collaboration with Large Vision-Language Models for Open Vocabulary Human-Object Interaction Detection

Add code
Jul 09, 2025
Viaarxiv icon

Adapting Lightweight Vision Language Models for Radiological Visual Question Answering

Add code
Jun 17, 2025
Figure 1 for Adapting Lightweight Vision Language Models for Radiological Visual Question Answering
Figure 2 for Adapting Lightweight Vision Language Models for Radiological Visual Question Answering
Figure 3 for Adapting Lightweight Vision Language Models for Radiological Visual Question Answering
Figure 4 for Adapting Lightweight Vision Language Models for Radiological Visual Question Answering
Viaarxiv icon

LPCM: Learning-based Predictive Coding for LiDAR Point Cloud Compression

Add code
May 26, 2025
Viaarxiv icon

Fast Jet Tagging with MLP-Mixers on FPGAs

Add code
Mar 05, 2025
Figure 1 for Fast Jet Tagging with MLP-Mixers on FPGAs
Figure 2 for Fast Jet Tagging with MLP-Mixers on FPGAs
Figure 3 for Fast Jet Tagging with MLP-Mixers on FPGAs
Figure 4 for Fast Jet Tagging with MLP-Mixers on FPGAs
Viaarxiv icon

DualStream Contextual Fusion Network: Efficient Target Speaker Extraction by Leveraging Mixture and Enrollment Interactions

Add code
Feb 12, 2025
Figure 1 for DualStream Contextual Fusion Network: Efficient Target Speaker Extraction by Leveraging Mixture and Enrollment Interactions
Figure 2 for DualStream Contextual Fusion Network: Efficient Target Speaker Extraction by Leveraging Mixture and Enrollment Interactions
Figure 3 for DualStream Contextual Fusion Network: Efficient Target Speaker Extraction by Leveraging Mixture and Enrollment Interactions
Figure 4 for DualStream Contextual Fusion Network: Efficient Target Speaker Extraction by Leveraging Mixture and Enrollment Interactions
Viaarxiv icon

X-CrossNet: A complex spectral mapping approach to target speaker extraction with cross attention speaker embedding fusion

Add code
Nov 21, 2024
Figure 1 for X-CrossNet: A complex spectral mapping approach to target speaker extraction with cross attention speaker embedding fusion
Figure 2 for X-CrossNet: A complex spectral mapping approach to target speaker extraction with cross attention speaker embedding fusion
Figure 3 for X-CrossNet: A complex spectral mapping approach to target speaker extraction with cross attention speaker embedding fusion
Figure 4 for X-CrossNet: A complex spectral mapping approach to target speaker extraction with cross attention speaker embedding fusion
Viaarxiv icon