Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yuqing Li

The Stabilizer Bootstrap of Quantum Machine Learning with up to 10000 qubits

Dec 16, 2024

Yuqing Li, Jinglei Cheng, Xulong Tang, Youtao Zhang, Frederic T. Chong, Junyu Liu

Abstract:Quantum machine learning is considered one of the flagship applications of quantum computers, where variational quantum circuits could be the leading paradigm both in the near-term quantum devices and the early fault-tolerant quantum computers. However, it is not clear how to identify the regime of quantum advantages from these circuits, and there is no explicit theory to guide the practical design of variational ansatze to achieve better performance. We address these challenges with the stabilizer bootstrap, a method that uses stabilizer-based techniques to optimize quantum neural networks before their quantum execution, together with theoretical proofs and high-performance computing with 10000 qubits or random datasets up to 1000 data. We find that, in a general setup of variational ansatze, the possibility of improvements from the stabilizer bootstrap depends on the structure of the observables and the size of the datasets. The results reveal that configurations exhibit two distinct behaviors: some maintain a constant probability of circuit improvement, while others show an exponential decay in improvement probability as qubit numbers increase. These patterns are termed strong stabilizer enhancement and weak stabilizer enhancement, respectively, with most situations falling in between. Our work seamlessly bridges techniques from fault-tolerant quantum computing with applications of variational quantum algorithms. Not only does it offer practical insights for designing variational circuits tailored to large-scale machine learning challenges, but it also maps out a clear trajectory for defining the boundaries of feasible and practical quantum advantages.

* 15 pages, 14 figures

Via

Access Paper or Ask Questions

Demystifying Lazy Training of Neural Networks from a Macroscopic Viewpoint

Apr 07, 2024

Yuqing Li, Tao Luo, Qixuan Zhou

Figure 1 for Demystifying Lazy Training of Neural Networks from a Macroscopic Viewpoint

Abstract:In this paper, we advance the understanding of neural network training dynamics by examining the intricate interplay of various factors introduced by weight parameters in the initialization process. Motivated by the foundational work of Luo et al. (J. Mach. Learn. Res., Vol. 22, Iss. 1, No. 71, pp 3327-3373), we explore the gradient descent dynamics of neural networks through the lens of macroscopic limits, where we analyze its behavior as width $m$ tends to infinity. Our study presents a unified approach with refined techniques designed for multi-layer fully connected neural networks, which can be readily extended to other neural network architectures. Our investigation reveals that gradient descent can rapidly drive deep neural networks to zero training loss, irrespective of the specific initialization schemes employed by weight parameters, provided that the initial scale of the output function $\kappa$ surpasses a certain threshold. This regime, characterized as the theta-lazy area, accentuates the predominant influence of the initial scale $\kappa$ over other factors on the training behavior of neural networks. Furthermore, our approach draws inspiration from the Neural Tangent Kernel (NTK) paradigm, and we expand its applicability. While NTK typically assumes that $\lim_{m\to\infty}\frac{\log \kappa}{\log m}=\frac{1}{2}$, and imposes each weight parameters to scale by the factor $\frac{1}{\sqrt{m}}$, in our theta-lazy regime, we discard the factor and relax the conditions to $\lim_{m\to\infty}\frac{\log \kappa}{\log m}>0$. Similar to NTK, the behavior of overparameterized neural networks within the theta-lazy regime trained by gradient descent can be effectively described by a specific kernel. Through rigorous analysis, our investigation illuminates the pivotal role of $\kappa$ in governing the training dynamics of neural networks.

Via

Access Paper or Ask Questions

Triple GNNs: Introducing Syntactic and Semantic Information for Conversational Aspect-Based Quadruple Sentiment Analysis

Mar 15, 2024

Binbin Li, Yuqing Li, Siyu Jia, Bingnan Ma, Yu Ding, Zisen Qi, Xingbang Tan, Menghan Guo, Shenghui Liu

Abstract:Conversational Aspect-Based Sentiment Analysis (DiaASQ) aims to detect quadruples \{target, aspect, opinion, sentiment polarity\} from given dialogues. In DiaASQ, elements constituting these quadruples are not necessarily confined to individual sentences but may span across multiple utterances within a dialogue. This necessitates a dual focus on both the syntactic information of individual utterances and the semantic interaction among them. However, previous studies have primarily focused on coarse-grained relationships between utterances, thus overlooking the potential benefits of detailed intra-utterance syntactic information and the granularity of inter-utterance relationships. This paper introduces the Triple GNNs network to enhance DiaAsQ. It employs a Graph Convolutional Network (GCN) for modeling syntactic dependencies within utterances and a Dual Graph Attention Network (DualGATs) to construct interactions between utterances. Experiments on two standard datasets reveal that our model significantly outperforms state-of-the-art baselines. The code is available at \url{https://github.com/nlperi2b/Triple-GNNs-}.

* Accepted by CSCWD2024

Via

Access Paper or Ask Questions

AC-EVAL: Evaluating Ancient Chinese Language Understanding in Large Language Models

Mar 11, 2024

Yuting Wei, Yuanxing Xu, Xinru Wei, Simin Yang, Yangfu Zhu, Yuqing Li, Di Liu, Bin Wu

Figure 1 for AC-EVAL: Evaluating Ancient Chinese Language Understanding in Large Language Models

Figure 2 for AC-EVAL: Evaluating Ancient Chinese Language Understanding in Large Language Models

Figure 3 for AC-EVAL: Evaluating Ancient Chinese Language Understanding in Large Language Models

Figure 4 for AC-EVAL: Evaluating Ancient Chinese Language Understanding in Large Language Models

Abstract:Given the importance of ancient Chinese in capturing the essence of rich historical and cultural heritage, the rapid advancements in Large Language Models (LLMs) necessitate benchmarks that can effectively evaluate their understanding of ancient contexts. To meet this need, we present AC-EVAL, an innovative benchmark designed to assess the advanced knowledge and reasoning capabilities of LLMs within the context of ancient Chinese. AC-EVAL is structured across three levels of difficulty reflecting different facets of language comprehension: general historical knowledge, short text understanding, and long text comprehension. The benchmark comprises 13 tasks, spanning historical facts, geography, social customs, art, philosophy, classical poetry and prose, providing a comprehensive assessment framework. Our extensive evaluation of top-performing LLMs, tailored for both English and Chinese, reveals a substantial potential for enhancing ancient text comprehension. By highlighting the strengths and weaknesses of LLMs, AC-EVAL aims to promote their development and application forward in the realms of ancient Chinese language education and scholarly research. The AC-EVAL data and evaluation code are available at https://github.com/yuting-wei/AC-EVAL.

Via

Access Paper or Ask Questions

Dynamic Multi-Scale Context Aggregation for Conversational Aspect-Based Sentiment Quadruple Analysis

Sep 27, 2023

Yuqing Li, Wenyuan Zhang, Binbin Li, Siyu Jia, Zisen Qi, Xingbang Tan

Abstract:Conversational aspect-based sentiment quadruple analysis (DiaASQ) aims to extract the quadruple of target-aspect-opinion-sentiment within a dialogue. In DiaASQ, a quadruple's elements often cross multiple utterances. This situation complicates the extraction process, emphasizing the need for an adequate understanding of conversational context and interactions. However, existing work independently encodes each utterance, thereby struggling to capture long-range conversational context and overlooking the deep inter-utterance dependencies. In this work, we propose a novel Dynamic Multi-scale Context Aggregation network (DMCA) to address the challenges. Specifically, we first utilize dialogue structure to generate multi-scale utterance windows for capturing rich contextual information. After that, we design a Dynamic Hierarchical Aggregation module (DHA) to integrate progressive cues between them. In addition, we form a multi-stage loss strategy to improve model performance and generalization ability. Extensive experimental results show that the DMCA model outperforms baselines significantly and achieves state-of-the-art performance.

Via

Access Paper or Ask Questions

Stochastic Modified Equations and Dynamics of Dropout Algorithm

May 25, 2023

Zhongwang Zhang, Yuqing Li, Tao Luo, Zhi-Qin John Xu

Figure 1 for Stochastic Modified Equations and Dynamics of Dropout Algorithm

Figure 2 for Stochastic Modified Equations and Dynamics of Dropout Algorithm

Figure 3 for Stochastic Modified Equations and Dynamics of Dropout Algorithm

Abstract:Dropout is a widely utilized regularization technique in the training of neural networks, nevertheless, its underlying mechanism and its impact on achieving good generalization abilities remain poorly understood. In this work, we derive the stochastic modified equations for analyzing the dynamics of dropout, where its discrete iteration process is approximated by a class of stochastic differential equations. In order to investigate the underlying mechanism by which dropout facilitates the identification of flatter minima, we study the noise structure of the derived stochastic modified equation for dropout. By drawing upon the structural resemblance between the Hessian and covariance through several intuitive approximations, we empirically demonstrate the universal presence of the inverse variance-flatness relation and the Hessian-variance relation, throughout the training process of dropout. These theoretical and empirical findings make a substantial contribution to our understanding of the inherent tendency of dropout to locate flatter minima.

Via

Access Paper or Ask Questions

Understanding the Initial Condensation of Convolutional Neural Networks

May 17, 2023

Zhangchen Zhou, Hanxu Zhou, Yuqing Li, Zhi-Qin John Xu

Abstract:Previous research has shown that fully-connected networks with small initialization and gradient-based training methods exhibit a phenomenon known as condensation during training. This phenomenon refers to the input weights of hidden neurons condensing into isolated orientations during training, revealing an implicit bias towards simple solutions in the parameter space. However, the impact of neural network structure on condensation has not been investigated yet. In this study, we focus on the investigation of convolutional neural networks (CNNs). Our experiments suggest that when subjected to small initialization and gradient-based training methods, kernel weights within the same CNN layer also cluster together during training, demonstrating a significant degree of condensation. Theoretically, we demonstrate that in a finite training period, kernels of a two-layer CNN with small initialization will converge to one or a few directions. This work represents a step towards a better understanding of the non-linear training behavior exhibited by neural networks with specialized structures.

Via

Access Paper or Ask Questions

Phase Diagram of Initial Condensation for Two-layer Neural Networks

Mar 12, 2023

Zhengan Chen, Yuqing Li, Tao Luo, Zhangchen Zhou, Zhi-Qin John Xu

Figure 1 for Phase Diagram of Initial Condensation for Two-layer Neural Networks

Figure 2 for Phase Diagram of Initial Condensation for Two-layer Neural Networks

Figure 3 for Phase Diagram of Initial Condensation for Two-layer Neural Networks

Figure 4 for Phase Diagram of Initial Condensation for Two-layer Neural Networks

Abstract:The phenomenon of distinct behaviors exhibited by neural networks under varying scales of initialization remains an enigma in deep learning research. In this paper, based on the earlier work by Luo et al.~\cite{luo2021phase}, we present a phase diagram of initial condensation for two-layer neural networks. Condensation is a phenomenon wherein the weight vectors of neural networks concentrate on isolated orientations during the training process, and it is a feature in non-linear learning process that enables neural networks to possess better generalization abilities. Our phase diagram serves to provide a comprehensive understanding of the dynamical regimes of neural networks and their dependence on the choice of hyperparameters related to initialization. Furthermore, we demonstrate in detail the underlying mechanisms by which small initialization leads to condensation at the initial training stage.

Via

Access Paper or Ask Questions

Embedding Principle: a hierarchical structure of loss landscape of deep neural networks

Nov 30, 2021

Yaoyu Zhang, Yuqing Li, Zhongwang Zhang, Tao Luo, Zhi-Qin John Xu

Figure 1 for Embedding Principle: a hierarchical structure of loss landscape of deep neural networks

Figure 2 for Embedding Principle: a hierarchical structure of loss landscape of deep neural networks

Figure 3 for Embedding Principle: a hierarchical structure of loss landscape of deep neural networks

Figure 4 for Embedding Principle: a hierarchical structure of loss landscape of deep neural networks

Abstract:We prove a general Embedding Principle of loss landscape of deep neural networks (NNs) that unravels a hierarchical structure of the loss landscape of NNs, i.e., loss landscape of an NN contains all critical points of all the narrower NNs. This result is obtained by constructing a class of critical embeddings which map any critical point of a narrower NN to a critical point of the target NN with the same output function. By discovering a wide class of general compatible critical embeddings, we provide a gross estimate of the dimension of critical submanifolds embedded from critical points of narrower NNs. We further prove an irreversiblility property of any critical embedding that the number of negative/zero/positive eigenvalues of the Hessian matrix of a critical point may increase but never decrease as an NN becomes wider through the embedding. Using a special realization of general compatible critical embedding, we prove a stringent necessary condition for being a "truly-bad" critical point that never becomes a strict-saddle point through any critical embedding. This result implies the commonplace of strict-saddle points in wide NNs, which may be an important reason underlying the easy optimization of wide NNs widely observed in practice.

Via

Access Paper or Ask Questions

Nonlinear Weighted Directed Acyclic Graph and A Priori Estimates for Neural Networks

Mar 30, 2021

Yuqing Li, Tao Luo, Chao Ma

Figure 1 for Nonlinear Weighted Directed Acyclic Graph and A Priori Estimates for Neural Networks

Figure 2 for Nonlinear Weighted Directed Acyclic Graph and A Priori Estimates for Neural Networks

Abstract:In an attempt to better understand structural benefits and generalization power of deep neural networks, we firstly present a novel graph theoretical formulation of neural network models, including fully connected, residual network~(ResNet) and densely connected networks~(DenseNet). Secondly, we extend the error analysis of the population risk for two layer network~\cite{ew2019prioriTwo} and ResNet~\cite{e2019prioriRes} to DenseNet, and show further that for neural networks satisfying certain mild conditions, similar estimates can be obtained. These estimates are a priori in nature since they depend sorely on the information prior to the training process, in particular, the bounds for the estimation errors are independent of the input dimension.

Via

Access Paper or Ask Questions