Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ruqian Hao

How and Why: Taming Flow Matching for Unsupervised Anomaly Detection and Localization

Aug 07, 2025

Liangwei Li, Lin Liu, Juanxiu Liu, Jing Zhang, Ruqian Hao, Xiaohui Du

Figure 1 for How and Why: Taming Flow Matching for Unsupervised Anomaly Detection and Localization

Figure 2 for How and Why: Taming Flow Matching for Unsupervised Anomaly Detection and Localization

Figure 3 for How and Why: Taming Flow Matching for Unsupervised Anomaly Detection and Localization

Figure 4 for How and Why: Taming Flow Matching for Unsupervised Anomaly Detection and Localization

Abstract:We propose a new paradigm for unsupervised anomaly detection and localization using Flow Matching (FM), which fundamentally addresses the model expressivity limitations of conventional flow-based methods. To this end, we formalize the concept of time-reversed Flow Matching (rFM) as a vector field regression along a predefined probability path to transform unknown data distributions into standard Gaussian. We bring two core observations that reshape our understanding of FM. First, we rigorously prove that FM with linear interpolation probability paths is inherently non-invertible. Second, our analysis reveals that employing reversed Gaussian probability paths in high-dimensional spaces can lead to trivial vector fields. This issue arises due to the manifold-related constraints. Building on the second observation, we propose Worst Transport (WT) displacement interpolation to reconstruct a non-probabilistic evolution path. The proposed WT-Flow enhances dynamical control over sample trajectories, constructing ''degenerate potential wells'' for anomaly-free samples while allowing anomalous samples to escape. This novel unsupervised paradigm offers a theoretically grounded separation mechanism for anomalous samples. Notably, FM provides a computationally tractable framework that scales to complex data. We present the first successful application of FM for the unsupervised anomaly detection task, achieving state-of-the-art performance at a single scale on the MVTec dataset. The reproducible code for training will be released upon camera-ready submission.

Via

Access Paper or Ask Questions

Towards Universal Learning-based Model for Cardiac Image Reconstruction: Summary of the CMRxRecon2024 Challenge

Mar 05, 2025

Fanwen Wang, Zi Wang, Yan Li, Jun Lyu, Chen Qin, Shuo Wang, Kunyuan Guo, Mengting Sun, Mingkai Huang, Haoyu Zhang(+48 more)

Abstract:Cardiovascular magnetic resonance (CMR) offers diverse imaging contrasts for assessment of cardiac function and tissue characterization. However, acquiring each single CMR modality is often time-consuming, and comprehensive clinical protocols require multiple modalities with various sampling patterns, further extending the overall acquisition time and increasing susceptibility to motion artifacts. Existing deep learning-based reconstruction methods are often designed for specific acquisition parameters, which limits their ability to generalize across a variety of scan scenarios. As part of the CMRxRecon Series, the CMRxRecon2024 challenge provides diverse datasets encompassing multi-modality multi-view imaging with various sampling patterns, and a platform for the international community to develop and benchmark reconstruction solutions in two well-crafted tasks. Task 1 is a modality-universal setting, evaluating the out-of-distribution generalization of the reconstructed model, while Task 2 follows sampling-universal setting assessing the one-for-all adaptability of the universal model. Main contributions include providing the first and largest publicly available multi-modality, multi-view cardiac k-space dataset; developing a benchmarking platform that simulates clinical acceleration protocols, with a shared code library and tutorial for various k-t undersampling patterns and data processing; giving technical insights of enhanced data consistency based on physic-informed networks and adaptive prompt-learning embedding to be versatile to different clinical settings; additional finding on evaluation metrics to address the limitations of conventional ground-truth references in universal reconstruction tasks.

* 13pages, 13 figures

Via

Access Paper or Ask Questions

UNSCT-HRNet: Modeling Anatomical Uncertainty for Landmark Detection in Total Hip Arthroplasty

Nov 13, 2024

Jiaxin Wan, Lin Liu, Haoran Wang, Liangwei Li, Wei Li, Shuheng Kou, Runtian Li, Jiayi Tang, Juanxiu Liu, Jing Zhang(+2 more)

Figure 1 for UNSCT-HRNet: Modeling Anatomical Uncertainty for Landmark Detection in Total Hip Arthroplasty

Figure 2 for UNSCT-HRNet: Modeling Anatomical Uncertainty for Landmark Detection in Total Hip Arthroplasty

Figure 3 for UNSCT-HRNet: Modeling Anatomical Uncertainty for Landmark Detection in Total Hip Arthroplasty

Figure 4 for UNSCT-HRNet: Modeling Anatomical Uncertainty for Landmark Detection in Total Hip Arthroplasty

Abstract:Total hip arthroplasty (THA) relies on accurate landmark detection from radiographic images, but unstructured data caused by irregular patient postures or occluded anatomical markers pose significant challenges for existing methods. To address this, we propose UNSCT-HRNet (Unstructured CT - High-Resolution Net), a deep learning-based framework that integrates a Spatial Relationship Fusion (SRF) module and an Uncertainty Estimation (UE) module. The SRF module, utilizing coordinate convolution and polarized attention, enhances the model's ability to capture complex spatial relationships. Meanwhile, the UE module which based on entropy ensures predictions are anatomically relevant. For unstructured data, the proposed method can predict landmarks without relying on the fixed number of points, which shows higher accuracy and better robustness comparing with the existing methods. Our UNSCT-HRNet demonstrates over a 60% improvement across multiple metrics in unstructured data. The experimental results also reveal that our approach maintains good performance on the structured dataset. Overall, the proposed UNSCT-HRNet has the potential to be used as a new reliable, automated solution for THA surgical planning and postoperative monitoring.

Via

Access Paper or Ask Questions

Enhancing Thermal Infrared Tracking with Natural Language Modeling and Coordinate Sequence Generation

Jul 11, 2024

Miao Yan, Ping Zhang, Haofei Zhang, Ruqian Hao, Juanxiu Liu, Xiaoyang Wang, Lin Liu

Figure 1 for Enhancing Thermal Infrared Tracking with Natural Language Modeling and Coordinate Sequence Generation

Figure 2 for Enhancing Thermal Infrared Tracking with Natural Language Modeling and Coordinate Sequence Generation

Figure 3 for Enhancing Thermal Infrared Tracking with Natural Language Modeling and Coordinate Sequence Generation

Figure 4 for Enhancing Thermal Infrared Tracking with Natural Language Modeling and Coordinate Sequence Generation

Abstract:Thermal infrared tracking is an essential topic in computer vision tasks because of its advantage of all-weather imaging. However, most conventional methods utilize only hand-crafted features, while deep learning-based correlation filtering methods are limited by simple correlation operations. Transformer-based methods ignore temporal and coordinate information, which is critical for TIR tracking that lacks texture and color information. In this paper, to address these issues, we apply natural language modeling to TIR tracking and propose a novel model called NLMTrack, which enhances the utilization of coordinate and temporal information. NLMTrack applies an encoder that unifies feature extraction and feature fusion, which simplifies the TIR tracking pipeline. To address the challenge of low detail and low contrast in TIR images, on the one hand, we design a multi-level progressive fusion module that enhances the semantic representation and incorporates multi-scale features. On the other hand, the decoder combines the TIR features and the coordinate sequence features using a causal transformer to generate the target sequence step by step. Moreover, we explore an adaptive loss aimed at elevating tracking accuracy and a simple template update strategy to accommodate the target's appearance variations. Experiments show that NLMTrack achieves state-of-the-art performance on multiple benchmarks. The Code is publicly available at \url{https://github.com/ELOESZHANG/NLMTrack}.

Via

Access Paper or Ask Questions

UADSN: Uncertainty-Aware Dual-Stream Network for Facial Nerve Segmentation

Jun 29, 2024

Guanghao Zhu, Lin Liu, Jing Zhang, Xiaohui Du, Ruqian Hao, Juanxiu Liu

Figure 1 for UADSN: Uncertainty-Aware Dual-Stream Network for Facial Nerve Segmentation

Figure 2 for UADSN: Uncertainty-Aware Dual-Stream Network for Facial Nerve Segmentation

Figure 3 for UADSN: Uncertainty-Aware Dual-Stream Network for Facial Nerve Segmentation

Figure 4 for UADSN: Uncertainty-Aware Dual-Stream Network for Facial Nerve Segmentation

Abstract:Facial nerve segmentation is crucial for preoperative path planning in cochlear implantation surgery. Recently, researchers have proposed some segmentation methods, such as atlas-based and deep learning-based methods. However, since the facial nerve is a tubular organ with a diameter of only 1.0-1.5mm, it is challenging to locate and segment the facial nerve in CT scans. In this work, we propose an uncertainty-aware dualstream network (UADSN). UADSN consists of a 2D segmentation stream and a 3D segmentation stream. Predictions from two streams are used to identify uncertain regions, and a consistency loss is employed to supervise the segmentation of these regions. In addition, we introduce channel squeeze & spatial excitation modules into the skip connections of U-shaped networks to extract meaningful spatial information. In order to consider topologypreservation, a clDice loss is introduced into the supervised loss function. Experimental results on the facial nerve dataset demonstrate the effectiveness of UADSN and our submodules.

Via

Access Paper or Ask Questions

AstMatch: Adversarial Self-training Consistency Framework for Semi-Supervised Medical Image Segmentation

Jun 28, 2024

Guanghao Zhu, Jing Zhang, Juanxiu Liu, Xiaohui Du, Ruqian Hao, Yong Liu, Lin Liu

Figure 1 for AstMatch: Adversarial Self-training Consistency Framework for Semi-Supervised Medical Image Segmentation

Figure 2 for AstMatch: Adversarial Self-training Consistency Framework for Semi-Supervised Medical Image Segmentation

Figure 3 for AstMatch: Adversarial Self-training Consistency Framework for Semi-Supervised Medical Image Segmentation

Figure 4 for AstMatch: Adversarial Self-training Consistency Framework for Semi-Supervised Medical Image Segmentation

Abstract:Semi-supervised learning (SSL) has shown considerable potential in medical image segmentation, primarily leveraging consistency regularization and pseudo-labeling. However, many SSL approaches only pay attention to low-level consistency and overlook the significance of pseudo-label reliability. Therefore, in this work, we propose an adversarial self-training consistency framework (AstMatch). Firstly, we design an adversarial consistency regularization (ACR) approach to enhance knowledge transfer and strengthen prediction consistency under varying perturbation intensities. Second, we apply a feature matching loss for adversarial training to incorporate high-level consistency regularization. Additionally, we present the pyramid channel attention (PCA) and efficient channel and spatial attention (ECSA) modules to improve the discriminator's performance. Finally, we propose an adaptive self-training (AST) approach to ensure the pseudo-labels' quality. The proposed AstMatch has been extensively evaluated with cutting-edge SSL methods on three public-available datasets. The experimental results under different labeled ratios indicate that AstMatch outperforms other existing methods, achieving new state-of-the-art performance. Our code will be available at https://github.com/GuanghaoZhu663/AstMatch.

Via

Access Paper or Ask Questions

SKD-TSTSAN: Three-Stream Temporal-Shift Attention Network Based on Self-Knowledge Distillation for Micro-Expression Recognition

Jun 25, 2024

Guanghao Zhu, Lin Liu, Yuhao Hu, Haixin Sun, Fang Liu, Xiaohui Du, Ruqian Hao, Juanxiu Liu, Yong Liu, Hao Deng(+1 more)

Figure 1 for SKD-TSTSAN: Three-Stream Temporal-Shift Attention Network Based on Self-Knowledge Distillation for Micro-Expression Recognition

Figure 2 for SKD-TSTSAN: Three-Stream Temporal-Shift Attention Network Based on Self-Knowledge Distillation for Micro-Expression Recognition

Figure 3 for SKD-TSTSAN: Three-Stream Temporal-Shift Attention Network Based on Self-Knowledge Distillation for Micro-Expression Recognition

Figure 4 for SKD-TSTSAN: Three-Stream Temporal-Shift Attention Network Based on Self-Knowledge Distillation for Micro-Expression Recognition

Abstract:Micro-expressions (MEs) are subtle facial movements that occur spontaneously when people try to conceal the real emotions. Micro-expression recognition (MER) is crucial in many fields, including criminal analysis and psychotherapy. However, MER is challenging since MEs have low intensity and ME datasets are small in size. To this end, a three-stream temporal-shift attention network based on self-knowledge distillation (SKD-TSTSAN) is proposed in this paper. Firstly, to address the low intensity of ME muscle movements, we utilize learning-based motion magnification modules to enhance the intensity of ME muscle movements. Secondly, we employ efficient channel attention (ECA) modules in the local-spatial stream to make the network focus on facial regions that are highly relevant to MEs. In addition, temporal shift modules (TSMs) are used in the dynamic-temporal stream, which enables temporal modeling with no additional parameters by mixing ME motion information from two different temporal domains. Furthermore, we introduce self-knowledge distillation (SKD) into the MER task by introducing auxiliary classifiers and using the deepest section of the network for supervision, encouraging all blocks to fully explore the features of the training set. Finally, extensive experiments are conducted on four ME datasets: CASME II, SAMM, MMEW, and CAS(ME)3. The experimental results demonstrate that our SKD-TSTSAN outperforms other existing methods and achieves new state-of-the-art performance. Our code will be available at https://github.com/GuanghaoZhu663/SKD-TSTSAN.

Via

Access Paper or Ask Questions

A Transfer Learning Based Active Learning Framework for Brain Tumor Classification

Nov 16, 2020

Ruqian Hao, Khashayar Namdar, Lin Liu, Farzad Khalvati

Figure 1 for A Transfer Learning Based Active Learning Framework for Brain Tumor Classification

Figure 2 for A Transfer Learning Based Active Learning Framework for Brain Tumor Classification

Figure 3 for A Transfer Learning Based Active Learning Framework for Brain Tumor Classification

Figure 4 for A Transfer Learning Based Active Learning Framework for Brain Tumor Classification

Abstract:Brain tumor is one of the leading causes of cancer-related death globally among children and adults. Precise classification of brain tumor grade (low-grade and high-grade glioma) at early stage plays a key role in successful prognosis and treatment planning. With recent advances in deep learning, Artificial Intelligence-enabled brain tumor grading systems can assist radiologists in the interpretation of medical images within seconds. The performance of deep learning techniques is, however, highly depended on the size of the annotated dataset. It is extremely challenging to label a large quantity of medical images given the complexity and volume of medical data. In this work, we propose a novel transfer learning based active learning framework to reduce the annotation cost while maintaining stability and robustness of the model performance for brain tumor classification. We employed a 2D slice-based approach to train and finetune our model on the Magnetic Resonance Imaging (MRI) training dataset of 203 patients and a validation dataset of 66 patients which was used as the baseline. With our proposed method, the model achieved Area Under Receiver Operating Characteristic (ROC) Curve (AUC) of 82.89% on a separate test dataset of 66 patients, which was 2.92% higher than the baseline AUC while saving at least 40% of labeling cost. In order to further examine the robustness of our method, we created a balanced dataset, which underwent the same procedure. The model achieved AUC of 82% compared with AUC of 78.48% for the baseline, which reassures the robustness and stability of our proposed transfer learning augmented with active learning framework while significantly reducing the size of training data.

Via

Access Paper or Ask Questions

A Comprehensive Study of Data Augmentation Strategies for Prostate Cancer Detection in Diffusion-weighted MRI using Convolutional Neural Networks

Jun 01, 2020

Ruqian Hao, Khashayar Namdar, Lin Liu, Masoom A. Haider, Farzad Khalvati

Figure 1 for A Comprehensive Study of Data Augmentation Strategies for Prostate Cancer Detection in Diffusion-weighted MRI using Convolutional Neural Networks

Figure 2 for A Comprehensive Study of Data Augmentation Strategies for Prostate Cancer Detection in Diffusion-weighted MRI using Convolutional Neural Networks

Figure 3 for A Comprehensive Study of Data Augmentation Strategies for Prostate Cancer Detection in Diffusion-weighted MRI using Convolutional Neural Networks

Figure 4 for A Comprehensive Study of Data Augmentation Strategies for Prostate Cancer Detection in Diffusion-weighted MRI using Convolutional Neural Networks

Abstract:Data augmentation refers to a group of techniques whose goal is to battle limited amount of available data to improve model generalization and push sample distribution toward the true distribution. While different augmentation strategies and their combinations have been investigated for various computer vision tasks in the context of deep learning, a specific work in the domain of medical imaging is rare and to the best of our knowledge, there has been no dedicated work on exploring the effects of various augmentation methods on the performance of deep learning models in prostate cancer detection. In this work, we have statically applied five most frequently used augmentation techniques (random rotation, horizontal flip, vertical flip, random crop, and translation) to prostate Diffusion-weighted Magnetic Resonance Imaging training dataset of 217 patients separately and evaluated the effect of each method on the accuracy of prostate cancer detection. The augmentation algorithms were applied independently to each data channel and a shallow as well as a deep Convolutional Neural Network (CNN) were trained on the five augmented sets separately. We used Area Under Receiver Operating Characteristic (ROC) curve (AUC) to evaluate the performance of the trained CNNs on a separate test set of 95 patients, using a validation set of 102 patients for finetuning. The shallow network outperformed the deep network with the best 2D slice-based AUC of 0.85 obtained by the rotation method.

Via

Access Paper or Ask Questions