Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Matthew Farrugia-Roberts

The University of Melbourne

You Are What You Eat -- AI Alignment Requires Understanding How Data Shapes Structure and Generalisation

Feb 08, 2025

Simon Pepin Lehalleur, Jesse Hoogland, Matthew Farrugia-Roberts, Susan Wei, Alexander Gietelink Oldenziel, George Wang, Liam Carroll, Daniel Murfet

Abstract:In this position paper, we argue that understanding the relation between structure in the data distribution and structure in trained models is central to AI alignment. First, we discuss how two neural networks can have equivalent performance on the training set but compute their outputs in essentially different ways and thus generalise differently. For this reason, standard testing and evaluation are insufficient for obtaining assurances of safety for widely deployed generally intelligent systems. We argue that to progress beyond evaluation to a robust mathematical science of AI alignment, we need to develop statistical foundations for an understanding of the relation between structure in the data distribution, internal structure in models, and how these structures underlie generalisation.

Via

Access Paper or Ask Questions

Dynamics of Transient Structure in In-Context Linear Regression Transformers

Jan 31, 2025

Liam Carroll, Jesse Hoogland, Matthew Farrugia-Roberts, Daniel Murfet

Figure 1 for Dynamics of Transient Structure in In-Context Linear Regression Transformers

Figure 2 for Dynamics of Transient Structure in In-Context Linear Regression Transformers

Figure 3 for Dynamics of Transient Structure in In-Context Linear Regression Transformers

Figure 4 for Dynamics of Transient Structure in In-Context Linear Regression Transformers

Abstract:Modern deep neural networks display striking examples of rich internal computational structure. Uncovering principles governing the development of such structure is a priority for the science of deep learning. In this paper, we explore the transient ridge phenomenon: when transformers are trained on in-context linear regression tasks with intermediate task diversity, they initially behave like ridge regression before specializing to the tasks in their training distribution. This transition from a general solution to a specialized solution is revealed by joint trajectory principal component analysis. Further, we draw on the theory of Bayesian internal model selection to suggest a general explanation for the phenomena of transient structure in transformers, based on an evolving tradeoff between loss and complexity. We empirically validate this explanation by measuring the model complexity of our transformers as defined by the local learning coefficient.

* 37 pages, 27 figures

Via

Access Paper or Ask Questions

The Developmental Landscape of In-Context Learning

Feb 04, 2024

Jesse Hoogland, George Wang, Matthew Farrugia-Roberts, Liam Carroll, Susan Wei, Daniel Murfet

Abstract:We show that in-context learning emerges in transformers in discrete developmental stages, when they are trained on either language modeling or linear regression tasks. We introduce two methods for detecting the milestones that separate these stages, by probing the geometry of the population loss in both parameter space and function space. We study the stages revealed by these new methods using a range of behavioral and structural metrics to establish their validity.

Via

Access Paper or Ask Questions

Computational Complexity of Detecting Proximity to Losslessly Compressible Neural Network Parameters

Jun 05, 2023

Matthew Farrugia-Roberts

Abstract:To better understand complexity in neural networks, we theoretically investigate the idealised phenomenon of lossless network compressibility, whereby an identical function can be implemented with a smaller network. We give an efficient formal algorithm for optimal lossless compression in the setting of single-hidden-layer hyperbolic tangent networks. To measure lossless compressibility, we define the rank of a parameter as the minimum number of hidden units required to implement the same function. Losslessly compressible parameters are atypical, but their existence has implications for nearby parameters. We define the proximate rank of a parameter as the rank of the most compressible parameter within a small $L^\infty$ neighbourhood. Unfortunately, detecting nearby losslessly compressible parameters is not so easy: we show that bounding the proximate rank is an NP-complete problem, using a reduction from Boolean satisfiability via a geometric problem involving covering points in the plane with small squares. These results underscore the computational complexity of measuring neural network complexity, laying a foundation for future theoretical and empirical work in this direction.

* 9 pages paper, 31 pages total, 9 figures, 3 tables

Via

Access Paper or Ask Questions

Functional Equivalence and Path Connectivity of Reducible Hyperbolic Tangent Networks

May 08, 2023

Matthew Farrugia-Roberts

Abstract:Understanding the learning process of artificial neural networks requires clarifying the structure of the parameter space within which learning takes place. A neural network parameter's functional equivalence class is the set of parameters implementing the same input--output function. For many architectures, almost all parameters have a simple and well-documented functional equivalence class. However, there is also a vanishing minority of reducible parameters, with richer functional equivalence classes caused by redundancies among the network's units. In this paper, we give an algorithmic characterisation of unit redundancies and reducible functional equivalence classes for a single-hidden-layer hyperbolic tangent architecture. We show that such functional equivalence classes are piecewise-linear path-connected sets, and that for parameters with a majority of redundant units, the sets have a diameter of at most 7 linear segments.

* 15 pages, 3 figures

Via

Access Paper or Ask Questions

Invariance in Policy Optimisation and Partial Identifiability in Reward Learning

Mar 14, 2022

Joar Skalse, Matthew Farrugia-Roberts, Stuart Russell, Alessandro Abate, Adam Gleave

Figure 1 for Invariance in Policy Optimisation and Partial Identifiability in Reward Learning

Figure 2 for Invariance in Policy Optimisation and Partial Identifiability in Reward Learning

Abstract:It's challenging to design reward functions for complex, real-world tasks. Reward learning lets one instead infer reward functions from data. However, multiple reward functions often fit the data equally well, even in the infinite-data limit. Prior work often considers reward functions to be uniquely recoverable, by imposing additional assumptions on data sources. By contrast, we formally characterise the partial identifiability of popular data sources, including demonstrations and trajectory preferences, under multiple common sets of assumptions. We analyse the impact of this partial identifiability on downstream tasks such as policy optimisation, including under changes in environment dynamics. We unify our results in a framework for comparing data sources and downstream tasks by their invariances, with implications for the design and selection of data sources for reward learning.

* 8 pages main paper, 24 pages total, 1 figure

Via

Access Paper or Ask Questions