Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Lizhong Ding

SchoenbAt: Rethinking Attention with Polynomial basis

May 18, 2025

Yuhan Guo, Lizhong Ding, Yuwan Yang, Xuewei Guo

Abstract:Kernelized attention extends the attention mechanism by modeling sequence correlations through kernel functions, making significant progresses in optimizing attention. Under the guarantee of harmonic analysis theory, kernel functions can be expanded with basis functions, inspiring random feature-based approaches to enhance the efficiency of kernelized attention while maintaining predictive performance. However, current random feature-based works are limited to the Fourier basis expansions under Bochner's theorem. We propose Schoenberg's theorem-based attention (SchoenbAt), which approximates dot-product kernelized attention with the polynomial basis under Schoenberg's theorem via random Maclaurin features and applies a two-stage regularization to constrain the input space and restore the output scale, acting as a drop-in replacement of dot-product kernelized attention. Our theoretical proof of the unbiasedness and concentration error bound of SchoenbAt supports its efficiency and accuracy as a kernelized attention approximation, which is also empirically validated under various random feature dimensions. Evaluations on real-world datasets demonstrate that SchoenbAt significantly enhances computational speed while preserving competitive performance in terms of precision, outperforming several efficient attention methods.

Via

Access Paper or Ask Questions

Unveiling and Causalizing CoT: A Causal Pespective

Feb 25, 2025

Jiarun Fu, Lizhong Ding, Hao Li, Pengqi Li, Qiuning Wei, Xu Chen

Abstract:Although Chain-of-Thought (CoT) has achieved remarkable success in enhancing the reasoning ability of large language models (LLMs), the mechanism of CoT remains a ``black box''. Even if the correct answers can frequently be obtained, existing CoTs struggle to make the reasoning understandable to human. In this paper, we unveil and causalize CoT from a causal perspective to ensure both correctness and understandability of all reasoning steps (to the best of our knowledge, the first such). We model causality of CoT via structural causal models (SCM) to unveil the reasoning mechanism of CoT. To measure the causality of CoT, we define the CoT Average Causal Effect (CACE) to test the causal relations between steps. For those steps without causality (wrong or unintelligible steps), we design a role-playing causal query algorithm to causalize these steps, resulting a causalized CoT with all steps correct and understandable. Experimental results on both open-source and closed-source LLMs demonstrate that the causal errors commonly in steps are effectively corrected and the reasoning ability of LLMs is significantly improved.

Via

Access Paper or Ask Questions

Macformer: Transformer with Random Maclaurin Feature Attention

Aug 21, 2024

Yuhan Guo, Lizhong Ding, Ye Yuan, Guoren Wang

Figure 1 for Macformer: Transformer with Random Maclaurin Feature Attention

Figure 2 for Macformer: Transformer with Random Maclaurin Feature Attention

Figure 3 for Macformer: Transformer with Random Maclaurin Feature Attention

Figure 4 for Macformer: Transformer with Random Maclaurin Feature Attention

Abstract:Random feature attention (RFA) adopts random fourier feature (RFF) methods to approximate the softmax function, resulting in a linear time and space attention mechanism that enables the construction of an efficient Transformer. Inspired by RFA, we propose Macformer, a Transformer architecture that employs random Maclaurin features (RMF) to approximate various dot-product kernels, thereby accelerating attention computations for long sequence. Macformer consists of Random Maclaurin Feature Attention (RMFA) and pre-post Scaling Batch Normalization (ppSBN), the former is an unbiased approximation for dot-product kernelized attention and the later is a two-stage regularization mechanism guaranteeing the error of RMFA. We conducted toy experiments to demonstrate the efficiency of RMFA and ppSBN, and experiments on long range arena (LRA) benchmark to validate the acceleration and accuracy of Macformer with different dot-product kernels. Experiment results of Macformer are consistent with our theoretical analysis.

Via

Access Paper or Ask Questions

Self-supervised Smoothing Graph Neural Networks

Sep 02, 2020

Lu Yu, Shichao Pei, Chuxu Zhang, Lizhong Ding, Jun Zhou, Longfei Li, Xiangliang Zhang

Figure 1 for Self-supervised Smoothing Graph Neural Networks

Figure 2 for Self-supervised Smoothing Graph Neural Networks

Figure 3 for Self-supervised Smoothing Graph Neural Networks

Figure 4 for Self-supervised Smoothing Graph Neural Networks

Abstract:This paper studies learning node representations with GNNs for unsupervised scenarios. We make a theoretical understanding and empirical demonstration about the non-steady performance of GNNs over different graph datasets, when the supervision signals are not appropriately defined. The performance of GNNs depends on both the node feature smoothness and the graph locality. To smooth the discrepancy of node proximity measured by graph topology and node feature, we proposed KS2L - a novel graph \underline{K}nowledge distillation regularized \underline{S}elf-\underline{S}upervised \underline{L}earning framework, with two complementary regularization modules, for intra-and cross-model graph knowledge distillation. We demonstrate the competitive performance of KS2L on a variety of benchmarks. Even with a single GCN layer, KS2L has consistently competitive or even better performance on various benchmark datasets.

* 11 pages, 2 figures

Via

Access Paper or Ask Questions

Theoretical Analysis of Divide-and-Conquer ERM: Beyond Square Loss and RKHS

Mar 17, 2020

Yong Liu, Lizhong Ding, Weiping Wang

Figure 1 for Theoretical Analysis of Divide-and-Conquer ERM: Beyond Square Loss and RKHS

Abstract:Theoretical analysis of the divide-and-conquer based distributed learning with least square loss in the reproducing kernel Hilbert space (RKHS) have recently been explored within the framework of learning theory. However, the studies on learning theory for general loss functions and hypothesis spaces remain limited. To fill the gap, we study the risk performance of distributed empirical risk minimization (ERM) for general loss functions and hypothesis spaces. The main contributions are two-fold. First, we derive two tight risk bounds under certain basic assumptions on the hypothesis space, as well as the smoothness, Lipschitz continuity, strong convexity of the loss function. Second, we further develop a more general risk bound for distributed ERM without the restriction of strong convexity.

Via

Access Paper or Ask Questions

Nearly Optimal Risk Bounds for Kernel K-Means

Mar 09, 2020

Yong Liu, Lizhong Ding, Hua Zhang, Wenqi Ren, Xiao Zhang, Shali Jiang, Xinwang Liu, Weiping Wang

Abstract:In this paper, we study the statistical properties of the kernel $k$-means and obtain a nearly optimal excess risk bound, substantially improving the state-of-art bounds in the existing clustering risk analyses. We further analyze the statistical effect of computational approximations of the Nystr\"{o}m kernel $k$-means, and demonstrate that it achieves the same statistical accuracy as the exact kernel $k$-means considering only $\sqrt{nk}$ Nystr\"{o}m landmark points. To the best of our knowledge, such sharp excess risk bounds for kernel (or approximate kernel) $k$-means have never been seen before.

Via

Access Paper or Ask Questions

Differentially Private ERM Based on Data Perturbation

Feb 20, 2020

Yilin Kang, Yong Liu, Lizhong Ding, Xinwang Liu, Xinyi Tong, Weiping Wang

Figure 1 for Differentially Private ERM Based on Data Perturbation

Figure 2 for Differentially Private ERM Based on Data Perturbation

Figure 3 for Differentially Private ERM Based on Data Perturbation

Abstract:In this paper, after observing that different training data instances affect the machine learning model to different extents, we attempt to improve the performance of differentially private empirical risk minimization (DP-ERM) from a new perspective. Specifically, we measure the contributions of various training data instances on the final machine learning model, and select some of them to add random noise. Considering that the key of our method is to measure each data instance separately, we propose a new `Data perturbation' based (DB) paradigm for DP-ERM: adding random noise to the original training data and achieving ($\epsilon,\delta$)-differential privacy on the final machine learning model, along with the preservation on the original data. By introducing the Influence Function (IF), we quantitatively measure the impact of the training data on the final model. Theoretical and experimental results show that our proposed DBDP-ERM paradigm enhances the model performance significantly.

Via

Access Paper or Ask Questions

Dynamically Visual Disambiguation of Keyword-based Image Search

May 27, 2019

Yazhou Yao, Zeren Sun, Fumin Shen, Li Liu, Limin Wang, Fan Zhu, Lizhong Ding, Gangshan Wu, Ling Shao

Figure 1 for Dynamically Visual Disambiguation of Keyword-based Image Search

Figure 2 for Dynamically Visual Disambiguation of Keyword-based Image Search

Figure 3 for Dynamically Visual Disambiguation of Keyword-based Image Search

Figure 4 for Dynamically Visual Disambiguation of Keyword-based Image Search

Abstract:Due to the high cost of manual annotation, learning directly from the web has attracted broad attention. One issue that limits their performance is the problem of visual polysemy. To address this issue, we present an adaptive multi-model framework that resolves polysemy by visual disambiguation. Compared to existing methods, the primary advantage of our approach lies in that our approach can adapt to the dynamic changes in the search results. Our proposed framework consists of two major steps: we first discover and dynamically select the text queries according to the image search results, then we employ the proposed saliency-guided deep multi-instance learning network to remove outliers and learn classification models for visual disambiguation. Extensive experiments demonstrate the superiority of our proposed approach.

* Accepted by International Joint Conference on Artificial Intelligence (IJCAI), 2019

Via

Access Paper or Ask Questions

Deep learning in bioinformatics: introduction, application, and perspective in big data era

Feb 28, 2019

Yu Li, Chao Huang, Lizhong Ding, Zhongxiao Li, Yijie Pan, Xin Gao

Figure 1 for Deep learning in bioinformatics: introduction, application, and perspective in big data era

Figure 2 for Deep learning in bioinformatics: introduction, application, and perspective in big data era

Figure 3 for Deep learning in bioinformatics: introduction, application, and perspective in big data era

Figure 4 for Deep learning in bioinformatics: introduction, application, and perspective in big data era

Abstract:Deep learning, which is especially formidable in handling big data, has achieved great success in various fields, including bioinformatics. With the advances of the big data era in biology, it is foreseeable that deep learning will become increasingly important in the field and will be incorporated in vast majorities of analysis pipelines. In this review, we provide both the exoteric introduction of deep learning, and concrete examples and implementations of its representative applications in bioinformatics. We start from the recent achievements of deep learning in the bioinformatics field, pointing out the problems which are suitable to use deep learning. After that, we introduce deep learning in an easy-to-understand fashion, from shallow neural networks to legendary convolutional neural networks, legendary recurrent neural networks, graph neural networks, generative adversarial networks, variational autoencoder, and the most recent state-of-the-art architectures. After that, we provide eight examples, covering five bioinformatics research directions and all the four kinds of data type, with the implementation written in Tensorflow and Keras. Finally, we discuss the common issues, such as overfitting and interpretability, that users will encounter when adopting deep learning methods and provide corresponding suggestions. The implementations are freely available at \url{https://github.com/lykaust15/Deep_learning_examples}.

Via

Access Paper or Ask Questions

Efficient Cross-Validation for Semi-Supervised Learning

Feb 13, 2019

Yong Liu, Jian Li, Guangjun Wu, Lizhong Ding, Weiping Wang

Figure 1 for Efficient Cross-Validation for Semi-Supervised Learning

Figure 2 for Efficient Cross-Validation for Semi-Supervised Learning

Abstract:Manifold regularization, such as laplacian regularized least squares (LapRLS) and laplacian support vector machine (LapSVM), has been widely used in semi-supervised learning, and its performance greatly depends on the choice of some hyper-parameters. Cross-validation (CV) is the most popular approach for selecting the optimal hyper-parameters, but it has high complexity due to multiple times of learner training. In this paper, we provide a method to approximate the CV for manifold regularization based on a notion of robust statistics, called Bouligand influence function (BIF). We first provide a strategy for approximating the CV via the Taylor expansion of BIF. Then, we show how to calculate the BIF for general loss function,and further give the approximate CV criteria for model selection in manifold regularization. The proposed approximate CV for manifold regularization requires training only once, hence can significantly improve the efficiency of traditional CV. Experimental results show that our approximate CV has no statistical discrepancy with the original one, but much smaller time cost.

Via

Access Paper or Ask Questions