Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yun Liao

Unveiling the Mechanisms of Explicit CoT Training: How Chain-of-Thought Enhances Reasoning Generalization

Feb 07, 2025

Xinhao Yao, Ruifeng Ren, Yun Liao, Yong Liu

Abstract:Training large language models (LLMs) with high-quality Chain-of-Thought (CoT) annotations has become a widely adopted strategy due to its significant enhancement of reasoning capabilities. To fully comprehend this approach, two questions naturally arise: (Q1) What advantages does training with CoT offer compared to training without CoT? (Q2) If there are advantages, what are the underlying mechanisms of explicit CoT training? Analyzing the advantages and mechanisms of CoT training is challenging due to the many factors involved. To address this, we conduct a detailed analysis using clear and controllable data distributions and, for the first time, reveal that CoT training offers the following advantages: (1) Training with CoT markedly improves reasoning generalization, extending it from in-distribution (ID) to both ID and out-of-distribution (OOD) scenarios, while also speeding up convergence; (2) Even when training with CoT includes a certain range of erroneous reasoning steps, it still enables the model to learn reasoning patterns, leading to systematic generalization. We further explore the underlying mechanisms from a circuit perspective: (1) The data distribution (e.g., ratio $\lambda$ and pattern) plays a crucial role in influencing the model's systematic generalization; (2) CoT training (with two-hop facts) internalizes reasoning into a two-stage generalizing circuit, where the number of stages corresponds to the explicit reasoning steps during training. Our findings elucidate the mechanisms underlying explicit CoT training and offer critical insights into tuning strategies for LLMs to achieve robust generalization.

Via

Access Paper or Ask Questions

Rethinking External Slow-Thinking: From Snowball Errors to Probability of Correct Reasoning

Jan 26, 2025

Zeyu Gan, Yun Liao, Yong Liu

Abstract:Test-time scaling, which is also often referred to as \textit{slow-thinking}, has been demonstrated to enhance multi-step reasoning in large language models (LLMs). However, despite its widespread utilization, the mechanisms underlying slow-thinking methods remain poorly understood. This paper explores the mechanisms of external slow-thinking from a theoretical standpoint. We begin by examining the snowball error effect within the LLM reasoning process and connect it to the likelihood of correct reasoning using information theory. Building on this, we show that external slow-thinking methods can be interpreted as strategies to mitigate the error probability. We further provide a comparative analysis of popular external slow-thinking approaches, ranging from simple to complex, highlighting their differences and interrelationships. Our findings suggest that the efficacy of these methods is not primarily determined by the specific framework employed, and that expanding the search scope or the model's internal reasoning capacity may yield more sustained improvements in the long term. We open-source our code at \url{https://github.com/ZyGan1999/Snowball-Errors-and-Probability}.

Via

Access Paper or Ask Questions

Ahpatron: A New Budgeted Online Kernel Learning Machine with Tighter Mistake Bound

Dec 12, 2023

Yun Liao, Junfan Li, Shizhong Liao, Qinghua Hu, Jianwu Dang

Figure 1 for Ahpatron: A New Budgeted Online Kernel Learning Machine with Tighter Mistake Bound

Figure 2 for Ahpatron: A New Budgeted Online Kernel Learning Machine with Tighter Mistake Bound

Figure 3 for Ahpatron: A New Budgeted Online Kernel Learning Machine with Tighter Mistake Bound

Figure 4 for Ahpatron: A New Budgeted Online Kernel Learning Machine with Tighter Mistake Bound

Abstract:In this paper, we study the mistake bound of online kernel learning on a budget. We propose a new budgeted online kernel learning model, called Ahpatron, which significantly improves the mistake bound of previous work and resolves the open problem posed by Dekel, Shalev-Shwartz, and Singer (2005). We first present an aggressive variant of Perceptron, named AVP, a model without budget, which uses an active updating rule. Then we design a new budget maintenance mechanism, which removes a half of examples,and projects the removed examples onto a hypothesis space spanned by the remaining examples. Ahpatron adopts the above mechanism to approximate AVP. Theoretical analyses prove that Ahpatron has tighter mistake bounds, and experimental results show that Ahpatron outperforms the state-of-the-art algorithms on the same or a smaller budget.

Via

Access Paper or Ask Questions

TKwinFormer: Top k Window Attention in Vision Transformers for Feature Matching

Aug 29, 2023

Yun Liao, Yide Di, Hao Zhou, Kaijun Zhu, Mingyu Lu, Yijia Zhang, Qing Duan, Junhui Liu

Abstract:Local feature matching remains a challenging task, primarily due to difficulties in matching sparse keypoints and low-texture regions. The key to solving this problem lies in effectively and accurately integrating global and local information. To achieve this goal, we introduce an innovative local feature matching method called TKwinFormer. Our approach employs a multi-stage matching strategy to optimize the efficiency of information interaction. Furthermore, we propose a novel attention mechanism called Top K Window Attention, which facilitates global information interaction through window tokens prior to patch-level matching, resulting in improved matching accuracy. Additionally, we design an attention block to enhance attention between channels. Experimental results demonstrate that TKwinFormer outperforms state-of-the-art methods on various benchmarks. Code is available at: https://github.com/LiaoYun0x0/TKwinFormer.

* 11 pages, 7 figures

Via

Access Paper or Ask Questions

Path Integral Based Convolution and Pooling for Heterogeneous Graph Neural Networks

Feb 26, 2023

Lingjie Kong, Yun Liao

Abstract:Graph neural networks (GNN) extends deep learning to graph-structure dataset. Similar to Convolutional Neural Networks (CNN) using on image prediction, convolutional and pooling layers are the foundation to success for GNN on graph prediction tasks. In the initial PAN paper, it uses a path integral based graph neural networks for graph prediction. Specifically, it uses a convolution operation that involves every path linking the message sender and receiver with learnable weights depending on the path length, which corresponds to the maximal entropy random walk. It further generalizes such convolution operation to a new transition matrix called maximal entropy transition (MET). Because the diagonal entries of the MET matrix is directly related to the subgraph centrality, it provide a trial mechanism for pooling based on centrality score. While the initial PAN paper only considers node features. We further extends its capability to handle complex heterogeneous graph including both node and edge features.

Via

Access Paper or Ask Questions

Scalable Polar Code Construction for Successive Cancellation List Decoding: A Graph Neural Network-Based Approach

Jul 03, 2022

Yun Liao, Seyyed Ali Hashemi, Hengjie Yang, John M. Cioffi

Figure 1 for Scalable Polar Code Construction for Successive Cancellation List Decoding: A Graph Neural Network-Based Approach

Figure 2 for Scalable Polar Code Construction for Successive Cancellation List Decoding: A Graph Neural Network-Based Approach

Figure 3 for Scalable Polar Code Construction for Successive Cancellation List Decoding: A Graph Neural Network-Based Approach

Figure 4 for Scalable Polar Code Construction for Successive Cancellation List Decoding: A Graph Neural Network-Based Approach

Abstract:While constructing polar codes for successive-cancellation decoding can be implemented efficiently by sorting the bit-channels, finding optimal polar-code constructions for the successive-cancellation list (SCL) decoding in an efficient and scalable manner still awaits investigation. This paper proposes a graph neural network (GNN)-based reinforcement learning algorithm, named the iterative message-passing (IMP) algorithm, to solve the polar-code construction problem for SCL decoding. The algorithm operates only on the local structure of the graph induced by polar-code's generator matrix. The size of the IMP model is independent of the blocklength and the code rate, making it scalable to construct polar codes with long blocklengths. Moreover, a single trained IMP model can be directly applied to a wide range of target blocklengths, code rates, and channel conditions, and corresponding polar codes can be generated without separate training. Numerical experiments show that the IMP algorithm finds polar-code constructions that significantly outperform the classical constructions under cyclic-redundancy-check-aided SCL (CA-SCL) decoding. Compared to other learning-based construction methods tailored to SCL/CA-SCL decoding, the IMP algorithm constructs polar codes with comparable or lower frame error rates, while reducing the training complexity significantly by eliminating the need of separate training at each target blocklength, code rate, and channel condition.

* 30 pages, 9 figures, submitted to IEEE Transactions on Communications

Via

Access Paper or Ask Questions

Construction of Polar Codes with Reinforcement Learning

Sep 19, 2020

Yun Liao, Seyyed Ali Hashemi, John Cioffi, Andrea Goldsmith

Figure 1 for Construction of Polar Codes with Reinforcement Learning

Figure 2 for Construction of Polar Codes with Reinforcement Learning

Figure 3 for Construction of Polar Codes with Reinforcement Learning

Figure 4 for Construction of Polar Codes with Reinforcement Learning

Abstract:This paper formulates the polar-code construction problem for the successive-cancellation list (SCL) decoder as a maze-traversing game, which can be solved by reinforcement learning techniques. The proposed method provides a novel technique for polar-code construction that no longer depends on sorting and selecting bit-channels by reliability. Instead, this technique decides whether the input bits should be frozen in a purely sequential manner. The equivalence of optimizing the polar-code construction for the SCL decoder under this technique and maximizing the expected reward of traversing a maze is drawn. Simulation results show that the standard polar-code constructions that are designed for the successive-cancellation decoder are no longer optimal for the SCL decoder with respect to the frame error rate. In contrast, the simulations show that, with a reasonable amount of training, the game-based construction method finds code constructions that have lower frame-error rate for various code lengths and decoders compared to standard constructions.

* To be published in Proceedings of IEEE Globecom 2020

Via

Access Paper or Ask Questions

Deep Neural Network Symbol Detection for Millimeter Wave Communications

Jul 25, 2019

Yun Liao, Nariman Farsad, Nir Shlezinger, Yonina C. Eldar, Andrea J. Goldsmith

Figure 1 for Deep Neural Network Symbol Detection for Millimeter Wave Communications

Figure 2 for Deep Neural Network Symbol Detection for Millimeter Wave Communications

Figure 3 for Deep Neural Network Symbol Detection for Millimeter Wave Communications

Figure 4 for Deep Neural Network Symbol Detection for Millimeter Wave Communications

Abstract:This paper proposes to use a deep neural network (DNN)-based symbol detector for mmWave systems such that CSI acquisition can be bypassed. In particular, we consider a sliding bidirectional recurrent neural network (BRNN) architecture that is suitable for the long memory length of typical mmWave channels. The performance of the DNN detector is evaluated in comparison to that of the Viterbi detector. The results show that the performance of the DNN detector is close to that of the optimal Viterbi detector with perfect CSI, and that it outperforms the Viterbi algorithm with CSI estimation error. Further experiments show that the DNN detector is robust to a wide range of noise levels and varying channel conditions, and that a pretrained detector can be reliably applied to different mmWave channel realizations with minimal overhead.

Via

Access Paper or Ask Questions