Picture for Jin Li

Jin Li

SoundAI Technology Co., Ltd

BUT Systems for WildSpoof Challenge: SASV in the Wild

Add code
Dec 14, 2025
Viaarxiv icon

BUT Systems for Environmental Sound Deepfake Detection in the ESDD 2026 Challenge

Add code
Dec 09, 2025
Viaarxiv icon

POTSA: A Cross-Lingual Speech Alignment Framework for Low Resource Speech-to-Text Translation

Add code
Nov 12, 2025
Viaarxiv icon

Adapting Speech Foundation Models with Large Language Models for Unified Speech Recognition

Add code
Oct 27, 2025
Viaarxiv icon

Beyond Retrieval-Ranking: A Multi-Agent Cognitive Decision Framework for E-Commerce Search

Add code
Oct 23, 2025
Viaarxiv icon

Bayesian Learning for Domain-Invariant Speaker Verification and Anti-Spoofing

Add code
Jun 09, 2025
Viaarxiv icon

A Synergistic Framework of Nonlinear Acoustic Computing and Reinforcement Learning for Real-World Human-Robot Interaction

Add code
May 04, 2025
Viaarxiv icon

MSSFC-Net:Enhancing Building Interpretation with Multi-Scale Spatial-Spectral Feature Collaboration

Add code
Apr 01, 2025
Figure 1 for MSSFC-Net:Enhancing Building Interpretation with Multi-Scale Spatial-Spectral Feature Collaboration
Figure 2 for MSSFC-Net:Enhancing Building Interpretation with Multi-Scale Spatial-Spectral Feature Collaboration
Figure 3 for MSSFC-Net:Enhancing Building Interpretation with Multi-Scale Spatial-Spectral Feature Collaboration
Figure 4 for MSSFC-Net:Enhancing Building Interpretation with Multi-Scale Spatial-Spectral Feature Collaboration
Viaarxiv icon

AdvAD: Exploring Non-Parametric Diffusion for Imperceptible Adversarial Attacks

Add code
Mar 12, 2025
Viaarxiv icon

Target Speaker Lipreading by Audio-Visual Self-Distillation Pretraining and Speaker Adaptation

Add code
Feb 09, 2025
Figure 1 for Target Speaker Lipreading by Audio-Visual Self-Distillation Pretraining and Speaker Adaptation
Figure 2 for Target Speaker Lipreading by Audio-Visual Self-Distillation Pretraining and Speaker Adaptation
Figure 3 for Target Speaker Lipreading by Audio-Visual Self-Distillation Pretraining and Speaker Adaptation
Figure 4 for Target Speaker Lipreading by Audio-Visual Self-Distillation Pretraining and Speaker Adaptation
Viaarxiv icon