Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

An Zhao

Diffusion Distillation With Direct Preference Optimization For Efficient 3D LiDAR Scene Completion

Apr 16, 2025

An Zhao, Shengyuan Zhang, Ling Yang, Zejian Li, Jiale Wu, Haoran Xu, AnYang Wei, Perry Pengyun GU, Lingyun Sun

Abstract:The application of diffusion models in 3D LiDAR scene completion is limited due to diffusion's slow sampling speed. Score distillation accelerates diffusion sampling but with performance degradation, while post-training with direct policy optimization (DPO) boosts performance using preference data. This paper proposes Distillation-DPO, a novel diffusion distillation framework for LiDAR scene completion with preference aligment. First, the student model generates paired completion scenes with different initial noises. Second, using LiDAR scene evaluation metrics as preference, we construct winning and losing sample pairs. Such construction is reasonable, since most LiDAR scene metrics are informative but non-differentiable to be optimized directly. Third, Distillation-DPO optimizes the student model by exploiting the difference in score functions between the teacher and student models on the paired completion scenes. Such procedure is repeated until convergence. Extensive experiments demonstrate that, compared to state-of-the-art LiDAR scene completion diffusion models, Distillation-DPO achieves higher-quality scene completion while accelerating the completion speed by more than 5-fold. Our method is the first to explore adopting preference learning in distillation to the best of our knowledge and provide insights into preference-aligned distillation. Our code is public available on https://github.com/happyw1nd/DistillationDPO.

* Our code is public available on https://github.com/happyw1nd/DistillationDPO

Via

Access Paper or Ask Questions

4D VQ-GAN: Synthesising Medical Scans at Any Time Point for Personalised Disease Progression Modelling of Idiopathic Pulmonary Fibrosis

Feb 08, 2025

An Zhao, Moucheng Xu, Ahmed H. Shahin, Wim Wuyts, Mark G. Jones, Joseph Jacob, Daniel C. Alexander

Abstract:Understanding the progression trajectories of diseases is crucial for early diagnosis and effective treatment planning. This is especially vital for life-threatening conditions such as Idiopathic Pulmonary Fibrosis (IPF), a chronic, progressive lung disease with a prognosis comparable to many cancers. Computed tomography (CT) imaging has been established as a reliable diagnostic tool for IPF. Accurately predicting future CT scans of early-stage IPF patients can aid in developing better treatment strategies, thereby improving survival outcomes. In this paper, we propose 4D Vector Quantised Generative Adversarial Networks (4D-VQ-GAN), a model capable of generating realistic CT volumes of IPF patients at any time point. The model is trained using a two-stage approach. In the first stage, a 3D-VQ-GAN is trained to reconstruct CT volumes. In the second stage, a Neural Ordinary Differential Equation (ODE) based temporal model is trained to capture the temporal dynamics of the quantised embeddings generated by the encoder in the first stage. We evaluate different configurations of our model for generating longitudinal CT scans and compare the results against ground truth data, both quantitatively and qualitatively. For validation, we conduct survival analysis using imaging biomarkers derived from generated CT scans and achieve a C-index comparable to that of biomarkers derived from the real CT scans. The survival analysis results demonstrate the potential clinical utility inherent to generated longitudinal CT scans, showing that they can reliably predict survival outcomes.

* 4D image synthesis, VQ-GAN, neural ODEs, spatial temporal disease progression modelling, CT, IPF

Via

Access Paper or Ask Questions

Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion

Dec 04, 2024

Shengyuan Zhang, An Zhao, Ling Yang, Zejian Li, Chenye Meng, Haoran Xu, Tianrun Chen, AnYang Wei, Perry Pengyun GU, Lingyun Sun

Figure 1 for Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion

Figure 2 for Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion

Figure 3 for Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion

Figure 4 for Distilling Diffusion Models to Efficient 3D LiDAR Scene Completion

Abstract:Diffusion models have been applied to 3D LiDAR scene completion due to their strong training stability and high completion quality. However, the slow sampling speed limits the practical application of diffusion-based scene completion models since autonomous vehicles require an efficient perception of surrounding environments. This paper proposes a novel distillation method tailored for 3D LiDAR scene completion models, dubbed $\textbf{ScoreLiDAR}$, which achieves efficient yet high-quality scene completion. ScoreLiDAR enables the distilled model to sample in significantly fewer steps after distillation. To improve completion quality, we also introduce a novel $\textbf{Structural Loss}$, which encourages the distilled model to capture the geometric structure of the 3D LiDAR scene. The loss contains a scene-wise term constraining the holistic structure and a point-wise term constraining the key landmark points and their relative configuration. Extensive experiments demonstrate that ScoreLiDAR significantly accelerates the completion time from 30.55 to 5.37 seconds per frame ($>$5$\times$) on SemanticKITTI and achieves superior performance compared to state-of-the-art 3D LiDAR scene completion models. Our code is publicly available at https://github.com/happyw1nd/ScoreLiDAR.

* https://github.com/happyw1nd/ScoreLiDAR

Via

Access Paper or Ask Questions

Distribution Backtracking Builds A Faster Convergence Trajectory for One-step Diffusion Distillation

Aug 28, 2024

Shengyuan Zhang, Ling Yang, Zejian Li, An Zhao, Chenye Meng, Changyuan Yang, Guang Yang, Zhiyuan Yang, Lingyun Sun

Figure 1 for Distribution Backtracking Builds A Faster Convergence Trajectory for One-step Diffusion Distillation

Figure 2 for Distribution Backtracking Builds A Faster Convergence Trajectory for One-step Diffusion Distillation

Figure 3 for Distribution Backtracking Builds A Faster Convergence Trajectory for One-step Diffusion Distillation

Figure 4 for Distribution Backtracking Builds A Faster Convergence Trajectory for One-step Diffusion Distillation

Abstract:Accelerating the sampling speed of diffusion models remains a significant challenge. Recent score distillation methods distill a heavy teacher model into an one-step student generator, which is optimized by calculating the difference between the two score functions on the samples generated by the student model. However, there is a score mismatch issue in the early stage of the distillation process, because existing methods mainly focus on using the endpoint of pre-trained diffusion models as teacher models, overlooking the importance of the convergence trajectory between the student generator and the teacher model. To address this issue, we extend the score distillation process by introducing the entire convergence trajectory of teacher models and propose Distribution Backtracking Distillation (DisBack) for distilling student generators. DisBask is composed of two stages: Degradation Recording and Distribution Backtracking. Degradation Recording is designed to obtain the convergence trajectory of teacher models, which records the degradation path from the trained teacher model to the untrained initial student generator. The degradation path implicitly represents the intermediate distributions of teacher models. Then Distribution Backtracking trains a student generator to backtrack the intermediate distributions for approximating the convergence trajectory of teacher models. Extensive experiments show that DisBack achieves faster and better convergence than the existing distillation method and accomplishes comparable generation performance. Notably, DisBack is easy to implement and can be generalized to existing distillation methods to boost performance. Our code is publicly available on https://github.com/SYZhang0805/DisBack.

Via

Access Paper or Ask Questions

CenTime: Event-Conditional Modelling of Censoring in Survival Analysis

Sep 15, 2023

Ahmed H. Shahin, An Zhao, Alexander C. Whitehead, Daniel C. Alexander, Joseph Jacob, David Barber

Abstract:Survival analysis is a valuable tool for estimating the time until specific events, such as death or cancer recurrence, based on baseline observations. This is particularly useful in healthcare to prognostically predict clinically important events based on patient data. However, existing approaches often have limitations; some focus only on ranking patients by survivability, neglecting to estimate the actual event time, while others treat the problem as a classification task, ignoring the inherent time-ordered structure of the events. Furthermore, the effective utilization of censored samples - training data points where the exact event time is unknown - is essential for improving the predictive accuracy of the model. In this paper, we introduce CenTime, a novel approach to survival analysis that directly estimates the time to event. Our method features an innovative event-conditional censoring mechanism that performs robustly even when uncensored data is scarce. We demonstrate that our approach forms a consistent estimator for the event model parameters, even in the absence of uncensored data. Furthermore, CenTime is easily integrated with deep learning models with no restrictions on batch size or the number of uncensored samples. We compare our approach with standard survival analysis methods, including the Cox proportional-hazard model and DeepHit. Our results indicate that CenTime offers state-of-the-art performance in predicting time-to-death while maintaining comparable ranking performance. Our implementation is publicly available at https://github.com/ahmedhshahin/CenTime.

Via

Access Paper or Ask Questions

A data-centric deep learning approach to airway segmentation

Jul 29, 2023

Wing Keung Cheung, Ashkan Pakzad, Nesrin Mogulkoc, Sarah Needleman, Bojidar Rangelov, Eyjolfur Gudmundsson, An Zhao, Mariam Abbas, Davina McLaverty, Dimitrios Asimakopoulos(+7 more)

Abstract:The morphology and distribution of airway tree abnormalities enables diagnosis and disease characterisation across a variety of chronic respiratory conditions. In this regard, airway segmentation plays a critical role in the production of the outline of the entire airway tree to enable estimation of disease extent and severity. In this study, we propose a data-centric deep learning technique to segment the airway tree. The proposed technique utilises interpolation and image split to improve data usefulness and quality. Then, an ensemble learning strategy is implemented to aggregate the segmented airway trees at different scales. In terms of segmentation performance (dice similarity coefficient), our method outperforms the baseline model by 2.5% on average when a combined loss is used. Further, our proposed technique has a low GPU usage and high flexibility enabling it to be deployed on any 2D deep learning model.

Via

Access Paper or Ask Questions

A hybrid CNN-RNN approach for survival analysis in a Lung Cancer Screening study

Mar 19, 2023

Yaozhi Lu, Shahab Aslani, An Zhao, Ahmed Shahin, David Barber, Mark Emberton, Daniel C. Alexander, Joseph Jacob

Abstract:In this study, we present a hybrid CNN-RNN approach to investigate long-term survival of subjects in a lung cancer screening study. Subjects who died of cardiovascular and respiratory causes were identified whereby the CNN model was used to capture imaging features in the CT scans and the RNN model was used to investigate time series and thus global information. The models were trained on subjects who underwent cardiovascular and respiratory deaths and a control cohort matched to participant age, gender, and smoking history. The combined model can achieve an AUC of 0.76 which outperforms humans at cardiovascular mortality prediction. The corresponding F1 and Matthews Correlation Coefficient are 0.63 and 0.42 respectively. The generalisability of the model is further validated on an 'external' cohort. The same models were applied to survival analysis with the Cox Proportional Hazard model. It was demonstrated that incorporating the follow-up history can lead to improvement in survival prediction. The Cox neural network can achieve an IPCW C-index of 0.75 on the internal dataset and 0.69 on an external dataset. Delineating imaging features associated with long-term survival can help focus preventative interventions appropriately, particularly for under-recognised pathologies thereby potentially reducing patient morbidity.

Via

Access Paper or Ask Questions

VAFO-Loss: VAscular Feature Optimised Loss Function for Retinal Artery/Vein Segmentation

Mar 12, 2022

Yukun Zhou, Moucheng Xu, Yipeng Hu, Stefano B. Blumberg, An Zhao, Siegfried K. Wagner, Pearse A. Keane, Daniel C. Alexander

Figure 1 for VAFO-Loss: VAscular Feature Optimised Loss Function for Retinal Artery/Vein Segmentation

Figure 2 for VAFO-Loss: VAscular Feature Optimised Loss Function for Retinal Artery/Vein Segmentation

Figure 3 for VAFO-Loss: VAscular Feature Optimised Loss Function for Retinal Artery/Vein Segmentation

Figure 4 for VAFO-Loss: VAscular Feature Optimised Loss Function for Retinal Artery/Vein Segmentation

Abstract:Estimating clinically-relevant vascular features following vessel segmentation is a standard pipeline for retinal vessel analysis, which provides potential ocular biomarkers for both ophthalmic disease and systemic disease. In this work, we integrate these clinical features into a novel vascular feature optimised loss function (VAFO-Loss), in order to regularise networks to produce segmentation maps, with which more accurate vascular features can be derived. Two common vascular features, vessel density and fractal dimension, are identified to be sensitive to intra-segment misclassification, which is a well-recognised problem in multi-class artery/vein segmentation particularly hindering the estimation of these vascular features. Thus we encode these two features into VAFO-Loss. We first show that incorporating our end-to-end VAFO-Loss in standard segmentation networks indeed improves vascular feature estimation, yielding quantitative improvement in stroke incidence prediction, a clinical downstream task. We also report a technically interesting finding that the trained segmentation network, albeit biased by the feature optimised loss VAFO-Loss, shows statistically significant improvement in segmentation metrics, compared to those trained with other state-of-the-art segmentation losses.

* 13 pages, 6 figures, 3 tables

Via

Access Paper or Ask Questions

Optimizing Short-Time Fourier Transform Parameters via Gradient Descent

Oct 28, 2020

An Zhao, Krishna Subramani, Paris Smaragdis

Figure 1 for Optimizing Short-Time Fourier Transform Parameters via Gradient Descent

Figure 2 for Optimizing Short-Time Fourier Transform Parameters via Gradient Descent

Figure 3 for Optimizing Short-Time Fourier Transform Parameters via Gradient Descent

Figure 4 for Optimizing Short-Time Fourier Transform Parameters via Gradient Descent

Abstract:The Short-Time Fourier Transform (STFT) has been a staple of signal processing, often being the first step for many audio tasks. A very familiar process when using the STFT is the search for the best STFT parameters, as they often have significant side effects if chosen poorly. These parameters are often defined in terms of an integer number of samples, which makes their optimization non-trivial. In this paper we show an approach that allows us to obtain a gradient for STFT parameters with respect to arbitrary cost functions, and thus enable the ability to employ gradient descent optimization of quantities like the STFT window length, or the STFT hop size. We do so for parameter values that stay constant throughout an input, but also for cases where these parameters have to dynamically change over time to accommodate varying signal characteristics.

* Submitted for ICASSP 2021

Via

Access Paper or Ask Questions

Domain-Adaptive Few-Shot Learning

Mar 19, 2020

An Zhao, Mingyu Ding, Zhiwu Lu, Tao Xiang, Yulei Niu, Jiechao Guan, Ji-Rong Wen, Ping Luo

Figure 1 for Domain-Adaptive Few-Shot Learning

Figure 2 for Domain-Adaptive Few-Shot Learning

Figure 3 for Domain-Adaptive Few-Shot Learning

Figure 4 for Domain-Adaptive Few-Shot Learning

Abstract:Existing few-shot learning (FSL) methods make the implicit assumption that the few target class samples are from the same domain as the source class samples. However, in practice this assumption is often invalid -- the target classes could come from a different domain. This poses an additional challenge of domain adaptation (DA) with few training samples. In this paper, the problem of domain-adaptive few-shot learning (DA-FSL) is tackled, which requires solving FSL and DA in a unified framework. To this end, we propose a novel domain-adversarial prototypical network (DAPN) model. It is designed to address a specific challenge in DA-FSL: the DA objective means that the source and target data distributions need to be aligned, typically through a shared domain-adaptive feature embedding space; but the FSL objective dictates that the target domain per class distribution must be different from that of any source domain class, meaning aligning the distributions across domains may harm the FSL performance. How to achieve global domain distribution alignment whilst maintaining source/target per-class discriminativeness thus becomes the key. Our solution is to explicitly enhance the source/target per-class separation before domain-adaptive feature embedding learning in the DAPN, in order to alleviate the negative effect of domain alignment on FSL. Extensive experiments show that our DAPN outperforms the state-of-the-art FSL and DA models, as well as their na\"ive combinations. The code is available at https://github.com/dingmyu/DAPN.

Via

Access Paper or Ask Questions