Picture for Xin Fang

Xin Fang

MVANet: Multi-Stage Video Attention Network for Sound Event Localization and Detection with Source Distance Estimation

Add code
Nov 21, 2024
Viaarxiv icon

The USTC-NERCSLIP Systems for The ICMC-ASR Challenge

Add code
Jul 02, 2024
Viaarxiv icon

Exploring Audio-Visual Information Fusion for Sound Event Localization and Detection In Low-Resource Realistic Scenarios

Add code
Jun 21, 2024
Viaarxiv icon

Multitask frame-level learning for few-shot sound event detection

Add code
Mar 17, 2024
Viaarxiv icon

SegRap2023: A Benchmark of Organs-at-Risk and Gross Tumor Volume Segmentation for Radiotherapy Planning of Nasopharyngeal Carcinoma

Add code
Dec 15, 2023
Figure 1 for SegRap2023: A Benchmark of Organs-at-Risk and Gross Tumor Volume Segmentation for Radiotherapy Planning of Nasopharyngeal Carcinoma
Figure 2 for SegRap2023: A Benchmark of Organs-at-Risk and Gross Tumor Volume Segmentation for Radiotherapy Planning of Nasopharyngeal Carcinoma
Figure 3 for SegRap2023: A Benchmark of Organs-at-Risk and Gross Tumor Volume Segmentation for Radiotherapy Planning of Nasopharyngeal Carcinoma
Figure 4 for SegRap2023: A Benchmark of Organs-at-Risk and Gross Tumor Volume Segmentation for Radiotherapy Planning of Nasopharyngeal Carcinoma
Viaarxiv icon

AST-SED: An Effective Sound Event Detection Method Based on Audio Spectrogram Transformer

Add code
Mar 07, 2023
Viaarxiv icon

Deep Virtual-to-Real Distillation for Pedestrian Crossing Prediction

Add code
Nov 02, 2022
Viaarxiv icon

A Complementary Joint Training Approach Using Unpaired Speech and Text for Low-Resource Automatic Speech Recognition

Add code
Apr 05, 2022
Figure 1 for A Complementary Joint Training Approach Using Unpaired Speech and Text for Low-Resource Automatic Speech Recognition
Figure 2 for A Complementary Joint Training Approach Using Unpaired Speech and Text for Low-Resource Automatic Speech Recognition
Figure 3 for A Complementary Joint Training Approach Using Unpaired Speech and Text for Low-Resource Automatic Speech Recognition
Figure 4 for A Complementary Joint Training Approach Using Unpaired Speech and Text for Low-Resource Automatic Speech Recognition
Viaarxiv icon

Learning Contextually Fused Audio-visual Representations for Audio-visual Speech Recognition

Add code
Feb 15, 2022
Figure 1 for Learning Contextually Fused Audio-visual Representations for Audio-visual Speech Recognition
Figure 2 for Learning Contextually Fused Audio-visual Representations for Audio-visual Speech Recognition
Figure 3 for Learning Contextually Fused Audio-visual Representations for Audio-visual Speech Recognition
Figure 4 for Learning Contextually Fused Audio-visual Representations for Audio-visual Speech Recognition
Viaarxiv icon

A Noise-Robust Self-supervised Pre-training Model Based Speech Representation Learning for Automatic Speech Recognition

Add code
Jan 22, 2022
Figure 1 for A Noise-Robust Self-supervised Pre-training Model Based Speech Representation Learning for Automatic Speech Recognition
Figure 2 for A Noise-Robust Self-supervised Pre-training Model Based Speech Representation Learning for Automatic Speech Recognition
Figure 3 for A Noise-Robust Self-supervised Pre-training Model Based Speech Representation Learning for Automatic Speech Recognition
Figure 4 for A Noise-Robust Self-supervised Pre-training Model Based Speech Representation Learning for Automatic Speech Recognition
Viaarxiv icon