Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Zijun Gao

Trustworthy Evaluation of Generative AI Models

Jan 31, 2025

Zijun Gao, Yan Sun

Abstract:Generative AI (GenAI) models have recently achieved remarkable empirical performance in various applications, however, their evaluations yet lack uncertainty quantification. In this paper, we propose a method to compare two generative models based on an unbiased estimator of their relative performance gap. Statistically, our estimator achieves parametric convergence rate and asymptotic normality, which enables valid inference. Computationally, our method is efficient and can be accelerated by parallel computing and leveraging pre-storing intermediate results. On simulated datasets with known ground truth, we show our approach effectively controls type I error and achieves power comparable with commonly used metrics. Furthermore, we demonstrate the performance of our method in evaluating diffusion models on real image datasets with statistical confidence.

* 5 figures, 1 table, 15 pages

Via

Access Paper or Ask Questions

Counterfactual explainability of black-box prediction models

Nov 03, 2024

Zijun Gao, Qingyuan Zhao

Abstract:It is crucial to be able to explain black-box prediction models to use them effectively and safely in practice. Most existing tools for model explanations are associational rather than causal, and we use two paradoxical examples to show that such explanations are generally inadequate. Motivated by the concept of genetic heritability in twin studies, we propose a new notion called counterfactual explainability for black-box prediction models. Counterfactual explainability has three key advantages: (1) it leverages counterfactual outcomes and extends methods for global sensitivity analysis (such as functional analysis of variance and Sobol's indices) to a causal setting; (2) it is defined not only for the totality of a set of input factors but also for their interactions (indeed, it is a probability measure on a whole ``explanation algebra''); (3) it also applies to dependent input factors whose causal relationship can be modeled by a directed acyclic graph, thus incorporating causal mechanisms into the explanation.

* 19 pages, 3 figures

Via

Access Paper or Ask Questions

Enhancing Convolutional Neural Networks with Higher-Order Numerical Difference Methods

Sep 08, 2024

Qi Wang, Zijun Gao, Mingxiu Sui, Taiyuan Mei, Xiaohan Cheng, Iris Li

Figure 1 for Enhancing Convolutional Neural Networks with Higher-Order Numerical Difference Methods

Figure 2 for Enhancing Convolutional Neural Networks with Higher-Order Numerical Difference Methods

Abstract:With the rise of deep learning technology in practical applications, Convolutional Neural Networks (CNNs) have been able to assist humans in solving many real-world problems. To enhance the performance of CNNs, numerous network architectures have been explored. Some of these architectures are designed based on the accumulated experience of researchers over time, while others are designed through neural architecture search methods. The improvements made to CNNs by the aforementioned methods are quite significant, but most of the improvement methods are limited in reality by model size and environmental constraints, making it difficult to fully realize the improved performance. In recent years, research has found that many CNN structures can be explained by the discretization of ordinary differential equations. This implies that we can design theoretically supported deep network structures using higher-order numerical difference methods. It should be noted that most of the previous CNN model structures are based on low-order numerical methods. Therefore, considering that the accuracy of linear multi-step numerical difference methods is higher than that of the forward Euler method, this paper proposes a stacking scheme based on the linear multi-step method. This scheme enhances the performance of ResNet without increasing the model size and compares it with the Runge-Kutta scheme. The experimental results show that the performance of the stacking scheme proposed in this paper is superior to existing stacking schemes (ResNet and HO-ResNet), and it has the capability to be extended to other types of neural networks.

Via

Access Paper or Ask Questions

Comprehensive Review and Empirical Evaluation of Causal Discovery Algorithms for Numerical Data

Jul 17, 2024

Wenjin Niu, Zijun Gao, Liyan Song, Lingbo Li

Abstract:Causal analysis has become an essential component in understanding the underlying causes of phenomena across various fields. Despite its significance, the existing literature on causal discovery algorithms is fragmented, with inconsistent methodologies and a lack of comprehensive evaluations. This study addresses these gaps by conducting an exhaustive review and empirical evaluation of causal discovery methods for numerical data, aiming to provide a clearer and more structured understanding of the field. Our research began with a comprehensive literature review spanning over a decade, revealing that existing surveys fall short in covering the vast array of causal discovery advancements. We meticulously analyzed over 200 scholarly articles to identify 24 distinct algorithms. This extensive analysis led to the development of a novel taxonomy tailored to the complexities of causal discovery, categorizing methods into six main types. Addressing the lack of comprehensive evaluations, our study conducts an extensive empirical assessment of more than 20 causal discovery algorithms on synthetic and real-world datasets. We categorize synthetic datasets based on size, linearity, and noise distribution, employing 5 evaluation metrics, and summarized the top-3 algorithm recommendations for different data scenarios. The recommendations have been validated on 2 real-world datasets. Our results highlight the significant impact of dataset characteristics on algorithm performance. Moreover, a metadata extraction strategy was developed to assist users in algorithm selection on unknown datasets. The accuracy of estimating metadata is higher than 80%. Based on these insights, we offer professional and practical recommendations to help users choose the most suitable causal discovery methods for their specific dataset needs.

Via

Access Paper or Ask Questions

An Uncertainty-guided Tiered Self-training Framework for Active Source-free Domain Adaptation in Prostate Segmentation

Jul 03, 2024

Zihao Luo, Xiangde Luo, Zijun Gao, Guotai Wang

Figure 1 for An Uncertainty-guided Tiered Self-training Framework for Active Source-free Domain Adaptation in Prostate Segmentation

Figure 2 for An Uncertainty-guided Tiered Self-training Framework for Active Source-free Domain Adaptation in Prostate Segmentation

Figure 3 for An Uncertainty-guided Tiered Self-training Framework for Active Source-free Domain Adaptation in Prostate Segmentation

Figure 4 for An Uncertainty-guided Tiered Self-training Framework for Active Source-free Domain Adaptation in Prostate Segmentation

Abstract:Deep learning models have exhibited remarkable efficacy in accurately delineating the prostate for diagnosis and treatment of prostate diseases, but challenges persist in achieving robust generalization across different medical centers. Source-free Domain Adaptation (SFDA) is a promising technique to adapt deep segmentation models to address privacy and security concerns while reducing domain shifts between source and target domains. However, recent literature indicates that the performance of SFDA remains far from satisfactory due to unpredictable domain gaps. Annotating a few target domain samples is acceptable, as it can lead to significant performance improvement with a low annotation cost. Nevertheless, due to extremely limited annotation budgets, careful consideration is needed in selecting samples for annotation. Inspired by this, our goal is to develop Active Source-free Domain Adaptation (ASFDA) for medical image segmentation. Specifically, we propose a novel Uncertainty-guided Tiered Self-training (UGTST) framework, consisting of efficient active sample selection via entropy-based primary local peak filtering to aggregate global uncertainty and diversity-aware redundancy filter, coupled with a tiered self-learning strategy, achieves stable domain adaptation. Experimental results on cross-center prostate MRI segmentation datasets revealed that our method yielded marked advancements, with a mere 5% annotation, exhibiting an average Dice score enhancement of 9.78% and 7.58% in two target domains compared with state-of-the-art methods, on par with fully supervised learning. Code is available at:https://github.com/HiLab-git/UGTST

* 11 pages, 3 figures, 2 tables, accept to MICCAI 2024

Via

Access Paper or Ask Questions

Images Speak Louder than Words: Understanding and Mitigating Bias in Vision-Language Model from a Causal Mediation Perspective

Jul 03, 2024

Zhaotian Weng, Zijun Gao, Jerone Andrews, Jieyu Zhao

Abstract:Vision-language models (VLMs) pre-trained on extensive datasets can inadvertently learn biases by correlating gender information with specific objects or scenarios. Current methods, which focus on modifying inputs and monitoring changes in the model's output probability scores, often struggle to comprehensively understand bias from the perspective of model components. We propose a framework that incorporates causal mediation analysis to measure and map the pathways of bias generation and propagation within VLMs. This approach allows us to identify the direct effects of interventions on model bias and the indirect effects of interventions on bias mediated through different model components. Our results show that image features are the primary contributors to bias, with significantly higher impacts than text features, specifically accounting for 32.57% and 12.63% of the bias in the MSCOCO and PASCAL-SENTENCE datasets, respectively. Notably, the image encoder's contribution surpasses that of the text encoder and the deep fusion encoder. Further experimentation confirms that contributions from both language and vision modalities are aligned and non-conflicting. Consequently, focusing on blurring gender representations within the image encoder, which contributes most to the model bias, reduces bias efficiently by 22.03% and 9.04% in the MSCOCO and PASCAL-SENTENCE datasets, respectively, with minimal performance loss or increased computational demands.

Via

Access Paper or Ask Questions

Deep Learning Powered Estimate of The Extrinsic Parameters on Unmanned Surface Vehicles

Jun 07, 2024

Yi Shen, Hao Liu, Chang Zhou, Wentao Wang, Zijun Gao, Qi Wang

Figure 1 for Deep Learning Powered Estimate of The Extrinsic Parameters on Unmanned Surface Vehicles

Figure 2 for Deep Learning Powered Estimate of The Extrinsic Parameters on Unmanned Surface Vehicles

Figure 3 for Deep Learning Powered Estimate of The Extrinsic Parameters on Unmanned Surface Vehicles

Figure 4 for Deep Learning Powered Estimate of The Extrinsic Parameters on Unmanned Surface Vehicles

Abstract:Unmanned Surface Vehicles (USVs) are pivotal in marine exploration, but their sensors' accuracy is compromised by the dynamic marine environment. Traditional calibration methods fall short in these conditions. This paper introduces a deep learning architecture that predicts changes in the USV's dynamic metacenter and refines sensors' extrinsic parameters in real time using a Time-Sequence General Regression Neural Network (GRNN) with Euler angles as input. Simulation data from Unity3D ensures robust training and testing. Experimental results show that the Time-Sequence GRNN achieves the lowest mean squared error (MSE) loss, outperforming traditional neural networks. This method significantly enhances sensor calibration for USVs, promising improved data accuracy in challenging maritime conditions. Future work will refine the network and validate results with real-world data.

* Accepted by The 9th Asia-Pacific Conference on Intelligent Robot Systems (ACIRS 2024)

Via

Access Paper or Ask Questions

An Enhanced Encoder-Decoder Network Architecture for Reducing Information Loss in Image Semantic Segmentation

May 26, 2024

Zijun Gao, Qi Wang, Taiyuan Mei, Xiaohan Cheng, Yun Zi, Haowei Yang

Figure 1 for An Enhanced Encoder-Decoder Network Architecture for Reducing Information Loss in Image Semantic Segmentation

Figure 2 for An Enhanced Encoder-Decoder Network Architecture for Reducing Information Loss in Image Semantic Segmentation

Figure 3 for An Enhanced Encoder-Decoder Network Architecture for Reducing Information Loss in Image Semantic Segmentation

Figure 4 for An Enhanced Encoder-Decoder Network Architecture for Reducing Information Loss in Image Semantic Segmentation

Abstract:The traditional SegNet architecture commonly encounters significant information loss during the sampling process, which detrimentally affects its accuracy in image semantic segmentation tasks. To counter this challenge, we introduce an innovative encoder-decoder network structure enhanced with residual connections. Our approach employs a multi-residual connection strategy designed to preserve the intricate details across various image scales more effectively, thus minimizing the information loss inherent to down-sampling procedures. Additionally, to enhance the convergence rate of network training and mitigate sample imbalance issues, we have devised a modified cross-entropy loss function incorporating a balancing factor. This modification optimizes the distribution between positive and negative samples, thus improving the efficiency of model training. Experimental evaluations of our model demonstrate a substantial reduction in information loss and improved accuracy in semantic segmentation. Notably, our proposed network architecture demonstrates a substantial improvement in the finely annotated mean Intersection over Union (mIoU) on the dataset compared to the conventional SegNet. The proposed network structure not only reduces operational costs by decreasing manual inspection needs but also scales up the deployment of AI-driven image analysis across different sectors.

Via

Access Paper or Ask Questions

TD3 Based Collision Free Motion Planning for Robot Navigation

May 24, 2024

Hao Liu, Yi Shen, Chang Zhou, Yuelin Zou, Zijun Gao, Qi Wang

Figure 1 for TD3 Based Collision Free Motion Planning for Robot Navigation

Abstract:This paper addresses the challenge of collision-free motion planning in automated navigation within complex environments. Utilizing advancements in Deep Reinforcement Learning (DRL) and sensor technologies like LiDAR, we propose the TD3-DWA algorithm, an innovative fusion of the traditional Dynamic Window Approach (DWA) with the Twin Delayed Deep Deterministic Policy Gradient (TD3). This hybrid algorithm enhances the efficiency of robotic path planning by optimizing the sampling interval parameters of DWA to effectively navigate around both static and dynamic obstacles. The performance of the TD3-DWA algorithm is validated through various simulation experiments, demonstrating its potential to significantly improve the reliability and safety of autonomous navigation systems.

Via

Access Paper or Ask Questions

Efficiency optimization of large-scale language models based on deep learning in natural language processing tasks

May 20, 2024

Taiyuan Mei, Yun Zi, Xiaohan Cheng, Zijun Gao, Qi Wang, Haowei Yang

Figure 1 for Efficiency optimization of large-scale language models based on deep learning in natural language processing tasks

Figure 2 for Efficiency optimization of large-scale language models based on deep learning in natural language processing tasks

Figure 3 for Efficiency optimization of large-scale language models based on deep learning in natural language processing tasks

Figure 4 for Efficiency optimization of large-scale language models based on deep learning in natural language processing tasks

Abstract:The internal structure and operation mechanism of large-scale language models are analyzed theoretically, especially how Transformer and its derivative architectures can restrict computing efficiency while capturing long-term dependencies. Further, we dig deep into the efficiency bottleneck of the training phase, and evaluate in detail the contribution of adaptive optimization algorithms (such as AdamW), massively parallel computing techniques, and mixed precision training strategies to accelerate convergence and reduce memory footprint. By analyzing the mathematical principles and implementation details of these algorithms, we reveal how they effectively improve training efficiency in practice. In terms of model deployment and inference optimization, this paper systematically reviews the latest advances in model compression techniques, focusing on strategies such as quantification, pruning, and knowledge distillation. By comparing the theoretical frameworks of these techniques and their effects in different application scenarios, we demonstrate their ability to significantly reduce model size and inference delay while maintaining model prediction accuracy. In addition, this paper critically examines the limitations of current efficiency optimization methods, such as the increased risk of overfitting, the control of performance loss after compression, and the problem of algorithm generality, and proposes some prospects for future research. In conclusion, this study provides a comprehensive theoretical framework for understanding the efficiency optimization of large-scale language models.

Via

Access Paper or Ask Questions