Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Qiuhao Zeng

Calibrated Language Models and How to Find Them with Label Smoothing

Aug 01, 2025

Jerry Huang, Peng Lu, Qiuhao Zeng

Abstract:Recent advances in natural language processing (NLP) have opened up greater opportunities to enable fine-tuned large language models (LLMs) to behave as more powerful interactive agents through improved instruction-following ability. However, understanding how this impacts confidence calibration for reliable model output has not been researched in full. In this work, we examine various open-sourced LLMs, identifying significant calibration degradation after instruction tuning in each. Seeking a practical solution, we look towards label smoothing, which has been shown as an effective method to regularize for overconfident predictions but has yet to be widely adopted in the supervised fine-tuning (SFT) of LLMs. We first provide insight as to why label smoothing is sufficient to maintain calibration throughout the SFT process. However, settings remain where the effectiveness of smoothing is severely diminished, in particular the case of large vocabulary LLMs (LV-LLMs). We posit the cause to stem from the ability to become over-confident, which has a direct relationship with the hidden size and vocabulary size, and justify this theoretically and experimentally. Finally, we address an outstanding issue regarding the memory footprint of the cross-entropy loss computation in the label smoothed loss setting, designing a customized kernel to dramatically reduce memory consumption without sacrificing speed or performance in comparison to existing solutions for non-smoothed losses.

* Accepted to the Forty-second International Conference on Machine Learning (ICML) 2025. First two authors contributed equally

Via

Access Paper or Ask Questions

Homophily Enhanced Graph Domain Adaptation

May 26, 2025

Ruiyi Fang, Bingheng Li, Jingyu Zhao, Ruizhi Pu, Qiuhao Zeng, Gezheng Xu, Charles Ling, Boyu Wang

Abstract:Graph Domain Adaptation (GDA) transfers knowledge from labeled source graphs to unlabeled target graphs, addressing the challenge of label scarcity. In this paper, we highlight the significance of graph homophily, a pivotal factor for graph domain alignment, which, however, has long been overlooked in existing approaches. Specifically, our analysis first reveals that homophily discrepancies exist in benchmarks. Moreover, we also show that homophily discrepancies degrade GDA performance from both empirical and theoretical aspects, which further underscores the importance of homophily alignment in GDA. Inspired by this finding, we propose a novel homophily alignment algorithm that employs mixed filters to smooth graph signals, thereby effectively capturing and mitigating homophily discrepancies between graphs. Experimental results on a variety of benchmarks verify the effectiveness of our method.

Via

Access Paper or Ask Questions

ZETA: Leveraging Z-order Curves for Efficient Top-k Attention

Jan 24, 2025

Qiuhao Zeng, Jerry Huang, Peng Lu, Gezheng Xu, Boxing Chen, Charles Ling, Boyu Wang

Abstract:Over recent years, the Transformer has become a fundamental building block for sequence modeling architectures. Yet at its core is the use of self-attention, whose memory and computational cost grow quadratically with the sequence length $N$, rendering it prohibitively expensive for long sequences. A promising approach is top-$k$ attention, which selects only the $k$ most relevant tokens and achieves performance comparable to vanilla self-attention while significantly reducing space and computational demands. However, causal masks require the current query token to only attend to past tokens, preventing the existing top-$k$ attention method from efficiently searching for the most relevant tokens in parallel, thereby limiting training efficiency. In this work, we propose ZETA, leveraging \textbf{Z}-Order Curves for \textbf{E}fficient \textbf{T}op-$k$ \textbf{A}ttention, to enable parallel querying of past tokens for entire sequences. % in both space and time complexity of $\mathcal{O}(N \log N)$. We first theoretically show that the choice of key and query dimensions involves a trade-off between the curse of dimensionality and the preservation of relative distances after projection. In light of this insight, we propose reducing the dimensionality of keys and queries in contrast to values and further leverage $Z$-order curves to map low-dimensional keys and queries into \emph{one}-dimensional space, which permits parallel sorting, thereby largely improving the efficiency for top-$k$ token selection. Experimental results demonstrate that ZETA matches the performance of standard attention on the synthetic \textsc{Multi-Query Associative Recall} task and outperforms attention and its variants on \textsc{Long Range Arena} and \textsc{WikiText-103} language modeling.

* 25 pages, 4 figures, accepted in International Conference on Learning Representations (ICLR) 2025

Via

Access Paper or Ask Questions

Generalizing across Temporal Domains with Koopman Operators

Feb 15, 2024

Qiuhao Zeng, Wei Wang, Fan Zhou, Gezheng Xu, Ruizhi Pu, Changjian Shui, Christian Gagne, Shichun Yang, Boyu Wang, Charles X. Ling

Figure 1 for Generalizing across Temporal Domains with Koopman Operators

Figure 2 for Generalizing across Temporal Domains with Koopman Operators

Figure 3 for Generalizing across Temporal Domains with Koopman Operators

Figure 4 for Generalizing across Temporal Domains with Koopman Operators

Abstract:In the field of domain generalization, the task of constructing a predictive model capable of generalizing to a target domain without access to target data remains challenging. This problem becomes further complicated when considering evolving dynamics between domains. While various approaches have been proposed to address this issue, a comprehensive understanding of the underlying generalization theory is still lacking. In this study, we contribute novel theoretic results that aligning conditional distribution leads to the reduction of generalization bounds. Our analysis serves as a key motivation for solving the Temporal Domain Generalization (TDG) problem through the application of Koopman Neural Operators, resulting in Temporal Koopman Networks (TKNets). By employing Koopman Operators, we effectively address the time-evolving distributions encountered in TDG using the principles of Koopman theory, where measurement functions are sought to establish linear transition relations between evolving domains. Through empirical evaluations conducted on synthetic and real-world datasets, we validate the effectiveness of our proposed approach.

* 15 pages, 7 figures, Accepted by AAAI 2024. arXiv admin note: text overlap with arXiv:2206.00047

Via

Access Paper or Ask Questions

Foresee What You Will Learn: Data Augmentation for Domain Generalization in Non-Stationary Environments

Jan 19, 2023

Qiuhao Zeng, Wei Wang, Fan Zhou, Charles Ling, Boyu Wang

Abstract:Existing domain generalization aims to learn a generalizable model to perform well even on unseen domains. For many real-world machine learning applications, the data distribution often shifts gradually along domain indices. For example, a self-driving car with a vision system drives from dawn to dusk, with the sky darkening gradually. Therefore, the system must be able to adapt to changes in ambient illumination and continue to drive safely on the road. In this paper, we formulate such problems as Evolving Domain Generalization, where a model aims to generalize well on a target domain by discovering and leveraging the evolving pattern of the environment. We then propose Directional Domain Augmentation (DDA), which simulates the unseen target features by mapping source data as augmentations through a domain transformer. Specifically, we formulate DDA as a bi-level optimization problem and solve it through a novel meta-learning approach in the representation space. We evaluate the proposed method on both synthetic datasets and realworld datasets, and empirical results show that our approach can outperform other existing methods.

* 12 pages, 6 figures, accepted by AAAI 2023

Via

Access Paper or Ask Questions

Domain-Augmented Domain Adaptation

Feb 21, 2022

Qiuhao Zeng, Tianze Luo, Boyu Wang

Abstract:Unsupervised domain adaptation (UDA) enables knowledge transfer from the labelled source domain to the unlabeled target domain by reducing the cross-domain discrepancy. However, most of the studies were based on direct adaptation from the source domain to the target domain and have suffered from large domain discrepancies. To overcome this challenge, in this paper, we propose the domain-augmented domain adaptation (DADA) to generate pseudo domains that have smaller discrepancies with the target domain, to enhance the knowledge transfer process by minimizing the discrepancy between the target domain and pseudo domains. Furthermore, we design a pseudo-labeling method for DADA by projecting representations from the target domain to multiple pseudo domains and taking the averaged predictions on the classification from the pseudo domains as the pseudo labels. We conduct extensive experiments with the state-of-the-art domain adaptation methods on four benchmark datasets: Office Home, Office-31, VisDA2017, and Digital datasets. The results demonstrate the superiority of our model.

* 12 pages, 5 figures

Via

Access Paper or Ask Questions

LGGNet: Learning from Local-Global-Graph Representations for Brain-Computer Interface

May 05, 2021

Yi Ding, Neethu Robinson, Qiuhao Zeng, Cuntai Guan

Figure 1 for LGGNet: Learning from Local-Global-Graph Representations for Brain-Computer Interface

Figure 2 for LGGNet: Learning from Local-Global-Graph Representations for Brain-Computer Interface

Figure 3 for LGGNet: Learning from Local-Global-Graph Representations for Brain-Computer Interface

Figure 4 for LGGNet: Learning from Local-Global-Graph Representations for Brain-Computer Interface

Abstract:In this paper, we propose LGG, a neurologically inspired graph neural network, to learn local-global-graph representations from Electroencephalography (EEG) for a Brain-Computer Interface (BCI). A temporal convolutional layer with multi-scale 1D convolutional kernels and kernel-level attention fusion is proposed to learn the temporal dynamics of EEG. Inspired by neurological knowledge of cognitive processes in the brain, we propose local and global graph-filtering layers to learn the brain activities within and between different functional areas of the brain to model the complex relations among them during the cognitive processes. Under the robust nested cross-validation settings, the proposed method is evaluated on the publicly available dataset DEAP, and the classification performance is compared with state-of-the-art methods, such as FBFgMDM, FBTSC, Unsupervised learning, DeepConvNet, ShallowConvNet, EEGNet, and TSception. The results show that the proposed method outperforms all these state-of-the-art methods, and the improvements are statistically significant (p<0.05) in most cases. The source code can be found at: https://github.com/yi-ding-cs/LGG

* This work has been submitted to the IEEE for possible publication. Copyright may be transferred without notice, after which this version may no longer be accessible

Via

Access Paper or Ask Questions

TSception: Capturing Temporal Dynamics and Spatial Asymmetry from EEG for Emotion Recognition

Apr 07, 2021

Yi Ding, Neethu Robinson, Qiuhao Zeng, Cuntai Guan

Figure 1 for TSception: Capturing Temporal Dynamics and Spatial Asymmetry from EEG for Emotion Recognition

Figure 2 for TSception: Capturing Temporal Dynamics and Spatial Asymmetry from EEG for Emotion Recognition

Figure 3 for TSception: Capturing Temporal Dynamics and Spatial Asymmetry from EEG for Emotion Recognition

Figure 4 for TSception: Capturing Temporal Dynamics and Spatial Asymmetry from EEG for Emotion Recognition

Abstract:In this paper, we propose TSception, a multi-scale convolutional neural network, to learn temporal dynamics and spatial asymmetry from affective electroencephalogram (EEG). TSception consists of dynamic temporal, asymmetric spatial, and high-level fusion Layers, which learn discriminative representations in the time and channel dimensions simultaneously. The dynamic temporal layer consists of multi-scale 1D convolutional kernels whose lengths are related to the sampling rate of the EEG signal, which learns its dynamic temporal and frequency representations. The asymmetric spatial layer takes advantage of the asymmetric neural activations underlying emotional responses, learning the discriminative global and hemisphere representations. The learned spatial representations will be fused by a high-level fusion layer. With robust nested cross-validation settings, the proposed method is evaluated on two publicly available datasets DEAP and AMIGOS. And the performance is compared with prior reported methods such as FBFgMDM, FBTSC, Unsupervised learning, DeepConvNet, ShallowConvNet, and EEGNet. The results indicate that the proposed method significantly (p<0.05) outperforms others in terms of classification accuracy. The proposed methods can be utilized in emotion regulation therapy for emotion recognition in the future. The source code can be found at: https://github.com/deepBrains/TSception-New

Via

Access Paper or Ask Questions

TSception: A Deep Learning Framework for Emotion Detection Using EEG

Apr 08, 2020

Yi Ding, Neethu Robinson, Qiuhao Zeng, Duo Chen, Aung Aung Phyo Wai, Tih-Shih Lee, Cuntai Guan

Figure 1 for TSception: A Deep Learning Framework for Emotion Detection Using EEG

Figure 2 for TSception: A Deep Learning Framework for Emotion Detection Using EEG

Figure 3 for TSception: A Deep Learning Framework for Emotion Detection Using EEG

Figure 4 for TSception: A Deep Learning Framework for Emotion Detection Using EEG

Abstract:In this paper, we propose a deep learning framework, TSception, for emotion detection from electroencephalogram (EEG). TSception consists of temporal and spatial convolutional layers, which learn discriminative representations in the time and channel domains simultaneously. The temporal learner consists of multi-scale 1D convolutional kernels whose lengths are related to the sampling rate of the EEG signal, which learns multiple temporal and frequency representations. The spatial learner takes advantage of the asymmetry property of emotion responses at the frontal brain area to learn the discriminative representations from the left and right hemispheres of the brain. In our study, a system is designed to study the emotional arousal in an immersive virtual reality (VR) environment. EEG data were collected from 18 healthy subjects using this system to evaluate the performance of the proposed deep learning network for the classification of low and high emotional arousal states. The proposed method is compared with SVM, EEGNet, and LSTM. TSception achieves a high classification accuracy of 86.03%, which outperforms the prior methods significantly (p<0.05). The code is available at https://github.com/deepBrains/TSception

* Authors information updated only. Accepted to be published in: 2020 International Joint Conference on Neural Networks (IJCNN), Glasgow, July 19--24, 2020, part of 2020 IEEE World Congress on Computational Intelligence (IEEE WCCI 2020)

Via

Access Paper or Ask Questions