Picture for Juan Liu

Juan Liu

M2S-AVSR: Modality-aware Multi-view Self-supervised Representation for Robust Audio-Visual Speech Recognition

Add code
Jun 04, 2026
Viaarxiv icon

DM-ASR: Diarization-aware Multi-speaker ASR with Large Language Models

Add code
Apr 24, 2026
Viaarxiv icon

Multi-View Based Audio Visual Target Speaker Extraction

Add code
Mar 11, 2026
Viaarxiv icon

Language-Invariant Multilingual Speaker Verification for the TidyVoice 2026 Challenge

Add code
Mar 09, 2026
Viaarxiv icon

Robust LLM-based Audio-Visual Speech Recognition with Sparse Modality Alignment and Visual Unit-Guided Refinement

Add code
Mar 04, 2026
Viaarxiv icon

Spatially-Augmented Sequence-to-Sequence Neural Diarization for Meetings

Add code
Oct 10, 2025
Viaarxiv icon

ECHO: Frequency-aware Hierarchical Encoding for Variable-length Signal

Add code
Aug 20, 2025
Figure 1 for ECHO: Frequency-aware Hierarchical Encoding for Variable-length Signal
Figure 2 for ECHO: Frequency-aware Hierarchical Encoding for Variable-length Signal
Figure 3 for ECHO: Frequency-aware Hierarchical Encoding for Variable-length Signal
Viaarxiv icon

SE-VLN: A Self-Evolving Vision-Language Navigation Framework Based on Multimodal Large Language Models

Add code
Jul 17, 2025
Viaarxiv icon

Self-Supervised and Generalizable Tokenization for CLIP-Based 3D Understanding

Add code
May 24, 2025
Figure 1 for Self-Supervised and Generalizable Tokenization for CLIP-Based 3D Understanding
Figure 2 for Self-Supervised and Generalizable Tokenization for CLIP-Based 3D Understanding
Figure 3 for Self-Supervised and Generalizable Tokenization for CLIP-Based 3D Understanding
Figure 4 for Self-Supervised and Generalizable Tokenization for CLIP-Based 3D Understanding
Viaarxiv icon

Multi-Channel Sequence-to-Sequence Neural Diarization: Experimental Results for The MISP 2025 Challenge

Add code
May 22, 2025
Viaarxiv icon