Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Alexander Long

Nesterov Method for Asynchronous Pipeline Parallel Optimization

May 02, 2025

Thalaiyasingam Ajanthan, Sameera Ramasinghe, Yan Zuo, Gil Avraham, Alexander Long

Abstract:Pipeline Parallelism (PP) enables large neural network training on small, interconnected devices by splitting the model into multiple stages. To maximize pipeline utilization, asynchronous optimization is appealing as it offers 100% pipeline utilization by construction. However, it is inherently challenging as the weights and gradients are no longer synchronized, leading to stale (or delayed) gradients. To alleviate this, we introduce a variant of Nesterov Accelerated Gradient (NAG) for asynchronous optimization in PP. Specifically, we modify the look-ahead step in NAG to effectively address the staleness in gradients. We theoretically prove that our approach converges at a sublinear rate in the presence of fixed delay in gradients. Our experiments on large-scale language modelling tasks using decoder-only architectures with up to 1B parameters, demonstrate that our approach significantly outperforms existing asynchronous methods, even surpassing the synchronous baseline.

Via

Access Paper or Ask Questions

Protocol Learning, Decentralized Frontier Risk and the No-Off Problem

Dec 10, 2024

Alexander Long

Abstract:Frontier models are currently developed and distributed primarily through two channels: centralized proprietary APIs or open-sourcing of pre-trained weights. We identify a third paradigm - Protocol Learning - where models are trained across decentralized networks of incentivized participants. This approach has the potential to aggregate orders of magnitude more computational resources than any single centralized entity, enabling unprecedented model scales and capabilities. However, it also introduces novel challenges: heterogeneous and unreliable nodes, malicious participants, the need for unextractable models to preserve incentives, and complex governance dynamics. To date, no systematic analysis has been conducted to assess the feasibility of Protocol Learning or the associated risks, particularly the 'No-Off Problem' arising from the inability to unilaterally halt a collectively trained model. We survey recent technical advances that suggest decentralized training may be feasible - covering emerging communication-efficient strategies and fault-tolerant methods - while highlighting critical open problems that remain. Contrary to the notion that decentralization inherently amplifies frontier risks, we argue that Protocol Learning's transparency, distributed governance, and democratized access ultimately reduce these risks compared to today's centralized regimes.

Via

Access Paper or Ask Questions

A Sampling Theory Perspective on Activations for Implicit Neural Representations

Feb 08, 2024

Hemanth Saratchandran, Sameera Ramasinghe, Violetta Shevchenko, Alexander Long, Simon Lucey

Figure 1 for A Sampling Theory Perspective on Activations for Implicit Neural Representations

Figure 2 for A Sampling Theory Perspective on Activations for Implicit Neural Representations

Figure 3 for A Sampling Theory Perspective on Activations for Implicit Neural Representations

Figure 4 for A Sampling Theory Perspective on Activations for Implicit Neural Representations

Abstract:Implicit Neural Representations (INRs) have gained popularity for encoding signals as compact, differentiable entities. While commonly using techniques like Fourier positional encodings or non-traditional activation functions (e.g., Gaussian, sinusoid, or wavelets) to capture high-frequency content, their properties lack exploration within a unified theoretical framework. Addressing this gap, we conduct a comprehensive analysis of these activations from a sampling theory perspective. Our investigation reveals that sinc activations, previously unused in conjunction with INRs, are theoretically optimal for signal encoding. Additionally, we establish a connection between dynamical systems and INRs, leveraging sampling theory to bridge these two paradigms.

Via

Access Paper or Ask Questions

Fast and Data Efficient Reinforcement Learning from Pixels via Non-Parametric Value Approximation

Mar 07, 2022

Alexander Long, Alan Blair, Herke van Hoof

Figure 1 for Fast and Data Efficient Reinforcement Learning from Pixels via Non-Parametric Value Approximation

Figure 2 for Fast and Data Efficient Reinforcement Learning from Pixels via Non-Parametric Value Approximation

Figure 3 for Fast and Data Efficient Reinforcement Learning from Pixels via Non-Parametric Value Approximation

Figure 4 for Fast and Data Efficient Reinforcement Learning from Pixels via Non-Parametric Value Approximation

Abstract:We present Nonparametric Approximation of Inter-Trace returns (NAIT), a Reinforcement Learning algorithm for discrete action, pixel-based environments that is both highly sample and computation efficient. NAIT is a lazy-learning approach with an update that is equivalent to episodic Monte-Carlo on episode completion, but that allows the stable incorporation of rewards while an episode is ongoing. We make use of a fixed domain-agnostic representation, simple distance based exploration and a proximity graph-based lookup to facilitate extremely fast execution. We empirically evaluate NAIT on both the 26 and 57 game variants of ATARI100k where, despite its simplicity, it achieves competitive performance in the online setting with greater than 100x speedup in wall-time.

* AAAI2022

Via

Access Paper or Ask Questions

Retrieval Augmented Classification for Long-Tail Visual Recognition

Feb 22, 2022

Alexander Long, Wei Yin, Thalaiyasingam Ajanthan, Vu Nguyen, Pulak Purkait, Ravi Garg, Alan Blair, Chunhua Shen, Anton van den Hengel

Figure 1 for Retrieval Augmented Classification for Long-Tail Visual Recognition

Figure 2 for Retrieval Augmented Classification for Long-Tail Visual Recognition

Figure 3 for Retrieval Augmented Classification for Long-Tail Visual Recognition

Figure 4 for Retrieval Augmented Classification for Long-Tail Visual Recognition

Abstract:We introduce Retrieval Augmented Classification (RAC), a generic approach to augmenting standard image classification pipelines with an explicit retrieval module. RAC consists of a standard base image encoder fused with a parallel retrieval branch that queries a non-parametric external memory of pre-encoded images and associated text snippets. We apply RAC to the problem of long-tail classification and demonstrate a significant improvement over previous state-of-the-art on Places365-LT and iNaturalist-2018 (14.5% and 6.7% respectively), despite using only the training datasets themselves as the external information source. We demonstrate that RAC's retrieval module, without prompting, learns a high level of accuracy on tail classes. This, in turn, frees the base encoder to focus on common classes, and improve its performance thereon. RAC represents an alternative approach to utilizing large, pretrained models without requiring fine-tuning, as well as a first step towards more effectively making use of external memory within common computer vision architectures.

Via

Access Paper or Ask Questions