Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yinghao Liu

A Hybrid Discriminative and Generative System for Universal Speech Enhancement

Jan 27, 2026

Yinghao Liu, Chengwei Liu, Xiaotao Liang, Haoyin Yan, Shaofei Xue, Zheng Xue

Abstract:Universal speech enhancement aims at handling inputs with various speech distortions and recording conditions. In this work, we propose a novel hybrid architecture that synergizes the signal fidelity of discriminative modeling with the reconstruction capabilities of generative modeling. Our system utilizes the discriminative TF-GridNet model with the Sampling-Frequency-Independent strategy to handle variable sampling rates universally. In parallel, an autoregressive model combined with spectral mapping modeling generates detail-rich speech while effectively suppressing generative artifacts. Finally, a fusion network learns adaptive weights of the two outputs under the optimization of signal-level losses and the comprehensive Speech Quality Assessment (SQA) loss. Our proposed system is evaluated in the ICASSP 2026 URGENT Challenge (Track 1) and ranks the third place.

* Accepted by ICASSP 2026.This work was submitted to the ICASSP 2026 URGENT Challenge (Track 1)

Via

Access Paper or Ask Questions

UltraLight VM-UNet: Parallel Vision Mamba Significantly Reduces Parameters for Skin Lesion Segmentation

Apr 09, 2024

Renkai Wu, Yinghao Liu, Pengchen Liang, Qing Chang

Figure 1 for UltraLight VM-UNet: Parallel Vision Mamba Significantly Reduces Parameters for Skin Lesion Segmentation

Figure 2 for UltraLight VM-UNet: Parallel Vision Mamba Significantly Reduces Parameters for Skin Lesion Segmentation

Figure 3 for UltraLight VM-UNet: Parallel Vision Mamba Significantly Reduces Parameters for Skin Lesion Segmentation

Figure 4 for UltraLight VM-UNet: Parallel Vision Mamba Significantly Reduces Parameters for Skin Lesion Segmentation

Abstract:Traditionally for improving the segmentation performance of models, most approaches prefer to use adding more complex modules. And this is not suitable for the medical field, especially for mobile medical devices, where computationally loaded models are not suitable for real clinical environments due to computational resource constraints. Recently, state-space models (SSMs), represented by Mamba, have become a strong competitor to traditional CNNs and Transformers. In this paper, we deeply explore the key elements of parameter influence in Mamba and propose an UltraLight Vision Mamba UNet (UltraLight VM-UNet) based on this. Specifically, we propose a method for processing features in parallel Vision Mamba, named PVM Layer, which achieves excellent performance with the lowest computational load while keeping the overall number of processing channels constant. We conducted comparisons and ablation experiments with several state-of-the-art lightweight models on three skin lesion public datasets and demonstrated that the UltraLight VM-UNet exhibits the same strong performance competitiveness with parameters of only 0.049M and GFLOPs of 0.060. In addition, this study deeply explores the key elements of parameter influence in Mamba, which will lay a theoretical foundation for Mamba to possibly become a new mainstream module for lightweighting in the future. The code is available from https://github.com/wurenkai/UltraLight-VM-UNet .

Via

Access Paper or Ask Questions

H-vmunet: High-order Vision Mamba UNet for Medical Image Segmentation

Mar 20, 2024

Renkai Wu, Yinghao Liu, Pengchen Liang, Qing Chang

Figure 1 for H-vmunet: High-order Vision Mamba UNet for Medical Image Segmentation

Figure 2 for H-vmunet: High-order Vision Mamba UNet for Medical Image Segmentation

Figure 3 for H-vmunet: High-order Vision Mamba UNet for Medical Image Segmentation

Figure 4 for H-vmunet: High-order Vision Mamba UNet for Medical Image Segmentation

Abstract:In the field of medical image segmentation, variant models based on Convolutional Neural Networks (CNNs) and Visual Transformers (ViTs) as the base modules have been very widely developed and applied. However, CNNs are often limited in their ability to deal with long sequences of information, while the low sensitivity of ViTs to local feature information and the problem of secondary computational complexity limit their development. Recently, the emergence of state-space models (SSMs), especially 2D-selective-scan (SS2D), has had an impact on the longtime dominance of traditional CNNs and ViTs as the foundational modules of visual neural networks. In this paper, we extend the adaptability of SS2D by proposing a High-order Vision Mamba UNet (H-vmunet) for medical image segmentation. Among them, the proposed High-order 2D-selective-scan (H-SS2D) progressively reduces the introduction of redundant information during SS2D operations through higher-order interactions. In addition, the proposed Local-SS2D module improves the learning ability of local features of SS2D at each order of interaction. We conducted comparison and ablation experiments on three publicly available medical image datasets (ISIC2017, Spleen, and CVC-ClinicDB), and the results all demonstrate the strong competitiveness of H-vmunet in medical image segmentation tasks. The code is available from https://github.com/wurenkai/H-vmunet .

Via

Access Paper or Ask Questions

Only Positive Cases: 5-fold High-order Attention Interaction Model for Skin Segmentation Derived Classification

Nov 27, 2023

Renkai Wu, Yinghao Liu, Pengchen Liang, Qing Chang

Figure 1 for Only Positive Cases: 5-fold High-order Attention Interaction Model for Skin Segmentation Derived Classification

Figure 2 for Only Positive Cases: 5-fold High-order Attention Interaction Model for Skin Segmentation Derived Classification

Figure 3 for Only Positive Cases: 5-fold High-order Attention Interaction Model for Skin Segmentation Derived Classification

Figure 4 for Only Positive Cases: 5-fold High-order Attention Interaction Model for Skin Segmentation Derived Classification

Abstract:Computer-aided diagnosis of skin diseases is an important tool. However, the interpretability of computer-aided diagnosis is currently poor. Dermatologists and patients cannot intuitively understand the learning and prediction process of neural networks, which will lead to a decrease in the credibility of computer-aided diagnosis. In addition, traditional methods need to be trained using negative samples in order to predict the presence or absence of a lesion, but medical data is often in short supply. In this paper, we propose a multiple high-order attention interaction model (MHA-UNet) for use in a highly explainable skin lesion segmentation task. MHA-UNet is able to obtain the presence or absence of a lesion by explainable reasoning without the need for training on negative samples. Specifically, we propose a high-order attention interaction mechanism that introduces squeeze attention to a higher level for feature attention. In addition, a multiple high-order attention interaction (MHAblock) module is proposed by combining the different features of different orders. For classifying the presence or absence of lesions, we conducted classification experiments on several publicly available datasets in the absence of negative samples, based on explainable reasoning about the interaction of 5 attention orders of MHAblock. The highest positive detection rate obtained from the experiments was 81.0% and the highest negative detection rate was 83.5%. For segmentation experiments, comparison experiments of the proposed method with 13 medical segmentation models and external validation experiments with 8 state-of-the-art models in three public datasets and our clinical dataset demonstrate the state-of-the-art performance of our model. The code is available from https://github.com/wurenkai/MHA-UNet.

Via

Access Paper or Ask Questions