Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hans Hao-Hsun Hsu

Do LLMs Really Forget? Evaluating Unlearning with Knowledge Correlation and Confidence Awareness

Jun 06, 2025

Rongzhe Wei, Peizhi Niu, Hans Hao-Hsun Hsu, Ruihan Wu, Haoteng Yin, Mohsen Ghassemi, Yifan Li, Vamsi K. Potluru, Eli Chien, Kamalika Chaudhuri(+2 more)

Abstract:Machine unlearning techniques aim to mitigate unintended memorization in large language models (LLMs). However, existing approaches predominantly focus on the explicit removal of isolated facts, often overlooking latent inferential dependencies and the non-deterministic nature of knowledge within LLMs. Consequently, facts presumed forgotten may persist implicitly through correlated information. To address these challenges, we propose a knowledge unlearning evaluation framework that more accurately captures the implicit structure of real-world knowledge by representing relevant factual contexts as knowledge graphs with associated confidence scores. We further develop an inference-based evaluation protocol leveraging powerful LLMs as judges; these judges reason over the extracted knowledge subgraph to determine unlearning success. Our LLM judges utilize carefully designed prompts and are calibrated against human evaluations to ensure their trustworthiness and stability. Extensive experiments on our newly constructed benchmark demonstrate that our framework provides a more realistic and rigorous assessment of unlearning performance. Moreover, our findings reveal that current evaluation strategies tend to overestimate unlearning effectiveness. Our code is publicly available at https://github.com/Graph-COM/Knowledge_Unlearning.git.

Via

Access Paper or Ask Questions

Structural Alignment Improves Graph Test-Time Adaptation

Feb 25, 2025

Hans Hao-Hsun Hsu, Shikun Liu, Han Zhao, Pan Li

Abstract:Graph-based learning has achieved remarkable success in domains ranging from recommendation to fraud detection and particle physics by effectively capturing underlying interaction patterns. However, it often struggles to generalize when distribution shifts occur, particularly those involving changes in network connectivity or interaction patterns. Existing approaches designed to mitigate such shifts typically require retraining with full access to source data, rendering them infeasible under strict computational or privacy constraints. To address this limitation, we propose a test-time structural alignment (TSA) algorithm for Graph Test-Time Adaptation (GTTA), a novel method that aligns graph structures during inference without revisiting the source domain. Built upon a theoretically grounded treatment of graph data distribution shifts, TSA integrates three key strategies: an uncertainty-aware neighborhood weighting that accommodates structure shifts, an adaptive balancing of self-node and neighborhood-aggregated representations driven by node representations' signal-to-noise ratio, and a decision boundary refinement that corrects remaining label and feature shifts. Extensive experiments on synthetic and real-world datasets demonstrate that TSA can consistently outperform both non-graph TTA methods and state-of-the-art GTTA baselines.

Via

Access Paper or Ask Questions

A Graph Is More Than Its Nodes: Towards Structured Uncertainty-Aware Learning on Graphs

Oct 27, 2022

Hans Hao-Hsun Hsu, Yuesong Shen, Daniel Cremers

Abstract:Current graph neural networks (GNNs) that tackle node classification on graphs tend to only focus on nodewise scores and are solely evaluated by nodewise metrics. This limits uncertainty estimation on graphs since nodewise marginals do not fully characterize the joint distribution given the graph structure. In this work, we propose novel edgewise metrics, namely the edgewise expected calibration error (ECE) and the agree/disagree ECEs, which provide criteria for uncertainty estimation on graphs beyond the nodewise setting. Our experiments demonstrate that the proposed edgewise metrics can complement the nodewise results and yield additional insights. Moreover, we show that GNN models which consider the structured prediction problem on graphs tend to have better uncertainty estimations, which illustrates the benefit of going beyond the nodewise setting.

* Presented at NeurIPS 2022 New Frontiers in Graph Learning Workshop (NeurIPS GLFrontiers 2022)

Via

Access Paper or Ask Questions

What Makes Graph Neural Networks Miscalibrated?

Oct 12, 2022

Hans Hao-Hsun Hsu, Yuesong Shen, Christian Tomani, Daniel Cremers

Figure 1 for What Makes Graph Neural Networks Miscalibrated?

Figure 2 for What Makes Graph Neural Networks Miscalibrated?

Figure 3 for What Makes Graph Neural Networks Miscalibrated?

Figure 4 for What Makes Graph Neural Networks Miscalibrated?

Abstract:Given the importance of getting calibrated predictions and reliable uncertainty estimations, various post-hoc calibration methods have been developed for neural networks on standard multi-class classification tasks. However, these methods are not well suited for calibrating graph neural networks (GNNs), which presents unique challenges such as accounting for the graph structure and the graph-induced correlations between the nodes. In this work, we conduct a systematic study on the calibration qualities of GNN node predictions. In particular, we identify five factors which influence the calibration of GNNs: general under-confident tendency, diversity of nodewise predictive distributions, distance to training nodes, relative confidence level, and neighborhood similarity. Furthermore, based on the insights from this study, we design a novel calibration method named Graph Attention Temperature Scaling (GATS), which is tailored for calibrating graph neural networks. GATS incorporates designs that address all the identified influential factors and produces nodewise temperature scaling using an attention-based architecture. GATS is accuracy-preserving, data-efficient, and expressive at the same time. Our experiments empirically verify the effectiveness of GATS, demonstrating that it can consistently achieve state-of-the-art calibration results on various graph datasets for different GNN backbones.

* Accepted to NeurIPS 2022

Via

Access Paper or Ask Questions

Automated Antenna Testing Using Encoder-Decoder-based Anomaly Detection

Nov 27, 2021

Hans Hao-Hsun Hsu, Jiawen Xu, Ravi Sama, Matthias Kovatsch

Figure 1 for Automated Antenna Testing Using Encoder-Decoder-based Anomaly Detection

Figure 2 for Automated Antenna Testing Using Encoder-Decoder-based Anomaly Detection

Figure 3 for Automated Antenna Testing Using Encoder-Decoder-based Anomaly Detection

Figure 4 for Automated Antenna Testing Using Encoder-Decoder-based Anomaly Detection

Abstract:We propose a new method for testing antenna arrays that records the radiating electromagnetic (EM) field using an absorbing material and evaluating the resulting thermal image series through an AI using a conditional encoder-decoder model. Given the power and phase of the signals fed into each array element, we are able to reconstruct normal sequences through our trained model and compare it to the real sequences observed by a thermal camera. These thermograms only contain low-level patterns such as blobs of various shapes. A contour-based anomaly detector can then map the reconstruction error matrix to an anomaly score to identify faulty antenna arrays and increase the classification F-measure (F-M) by up to 46%. We show our approach on the time series thermograms collected by our antenna testing system. Conventionally, a variational autoencoder (VAE) learning observation noise may yield better results than a VAE with a constant noise assumption. However, we demonstrate that this is not the case for anomaly detection on such low-level patterns for two reasons. First, the baseline metric reconstruction probability, which incorporates the learned observation noise, fails to differentiate anomalous patterns. Second, the area under the receiver operating characteristic (ROC) curve of a VAE with a lower observation noise assumption achieves 11.83% higher than that of a VAE with learned noise.

* 20th IEEE International Conference on Machine Learning and Applications (ICMLA 2021)

Via

Access Paper or Ask Questions