Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Abhiram Iyer

Permutation Invariant Learning with High-Dimensional Particle Filters

Oct 30, 2024

Akhilan Boopathy, Aneesh Muppidi, Peggy Yang, Abhiram Iyer, William Yue, Ila Fiete

Figure 1 for Permutation Invariant Learning with High-Dimensional Particle Filters

Figure 2 for Permutation Invariant Learning with High-Dimensional Particle Filters

Figure 3 for Permutation Invariant Learning with High-Dimensional Particle Filters

Figure 4 for Permutation Invariant Learning with High-Dimensional Particle Filters

Abstract:Sequential learning in deep models often suffers from challenges such as catastrophic forgetting and loss of plasticity, largely due to the permutation dependence of gradient-based algorithms, where the order of training data impacts the learning outcome. In this work, we introduce a novel permutation-invariant learning framework based on high-dimensional particle filters. We theoretically demonstrate that particle filters are invariant to the sequential ordering of training minibatches or tasks, offering a principled solution to mitigate catastrophic forgetting and loss-of-plasticity. We develop an efficient particle filter for optimizing high-dimensional models, combining the strengths of Bayesian methods with gradient-based optimization. Through extensive experiments on continual supervised and reinforcement learning benchmarks, including SplitMNIST, SplitCIFAR100, and ProcGen, we empirically show that our method consistently improves performance, while reducing variance compared to standard baselines.

* Website: https://aneeshers.github.io/PermutationInvariantLearning/

Via

Access Paper or Ask Questions

Breaking Neural Network Scaling Laws with Modularity

Sep 09, 2024

Akhilan Boopathy, Sunshine Jiang, William Yue, Jaedong Hwang, Abhiram Iyer, Ila Fiete

Figure 1 for Breaking Neural Network Scaling Laws with Modularity

Figure 2 for Breaking Neural Network Scaling Laws with Modularity

Figure 3 for Breaking Neural Network Scaling Laws with Modularity

Figure 4 for Breaking Neural Network Scaling Laws with Modularity

Abstract:Modular neural networks outperform nonmodular neural networks on tasks ranging from visual question answering to robotics. These performance improvements are thought to be due to modular networks' superior ability to model the compositional and combinatorial structure of real-world problems. However, a theoretical explanation of how modularity improves generalizability, and how to leverage task modularity while training networks remains elusive. Using recent theoretical progress in explaining neural network generalization, we investigate how the amount of training data required to generalize on a task varies with the intrinsic dimensionality of a task's input. We show theoretically that when applied to modularly structured tasks, while nonmodular networks require an exponential number of samples with task dimensionality, modular networks' sample complexity is independent of task dimensionality: modular networks can generalize in high dimensions. We then develop a novel learning rule for modular networks to exploit this advantage and empirically show the improved generalization of the rule, both in- and out-of-distribution, on high-dimensional, modular tasks.

Via

Access Paper or Ask Questions

Towards Exact Computation of Inductive Bias

Jun 22, 2024

Akhilan Boopathy, William Yue, Jaedong Hwang, Abhiram Iyer, Ila Fiete

Abstract:Much research in machine learning involves finding appropriate inductive biases (e.g. convolutional neural networks, momentum-based optimizers, transformers) to promote generalization on tasks. However, quantification of the amount of inductive bias associated with these architectures and hyperparameters has been limited. We propose a novel method for efficiently computing the inductive bias required for generalization on a task with a fixed training data budget; formally, this corresponds to the amount of information required to specify well-generalizing models within a specific hypothesis space of models. Our approach involves modeling the loss distribution of random hypotheses drawn from a hypothesis space to estimate the required inductive bias for a task relative to these hypotheses. Unlike prior work, our method provides a direct estimate of inductive bias without using bounds and is applicable to diverse hypothesis spaces. Moreover, we derive approximation error bounds for our estimation approach in terms of the number of sampled hypotheses. Consistent with prior results, our empirical results demonstrate that higher dimensional tasks require greater inductive bias. We show that relative to other expressive model classes, neural networks as a model class encode large amounts of inductive bias. Furthermore, our measure quantifies the relative difference in inductive bias between different neural network architectures. Our proposed inductive bias metric provides an information-theoretic interpretation of the benefits of specific model architectures for certain tasks and provides a quantitative guide to developing tasks requiring greater inductive bias, thereby encouraging the development of more powerful inductive biases.

* Published at IJCAI 2024

Via

Access Paper or Ask Questions

Uncovering Latent Memories: Assessing Data Leakage and Memorization Patterns in Large Language Models

Jun 20, 2024

Sunny Duan, Mikail Khona, Abhiram Iyer, Rylan Schaeffer, Ila R Fiete

Abstract:The proliferation of large language models has revolutionized natural language processing tasks, yet it raises profound concerns regarding data privacy and security. Language models are trained on extensive corpora including potentially sensitive or proprietary information, and the risk of data leakage -- where the model response reveals pieces of such information -- remains inadequately understood. This study examines susceptibility to data leakage by quantifying the phenomenon of memorization in machine learning models, focusing on the evolution of memorization patterns over training. We investigate how the statistical characteristics of training data influence the memories encoded within the model by evaluating how repetition influences memorization. We reproduce findings that the probability of memorizing a sequence scales logarithmically with the number of times it is present in the data. Furthermore, we find that sequences which are not apparently memorized after the first encounter can be uncovered throughout the course of training even without subsequent encounters. The presence of these latent memorized sequences presents a challenge for data privacy since they may be hidden at the final checkpoint of the model. To this end, we develop a diagnostic test for uncovering these latent memorized sequences by considering their cross entropy loss.

Via

Access Paper or Ask Questions

Resampling-free Particle Filters in High-dimensions

Apr 21, 2024

Akhilan Boopathy, Aneesh Muppidi, Peggy Yang, Abhiram Iyer, William Yue, Ila Fiete

Abstract:State estimation is crucial for the performance and safety of numerous robotic applications. Among the suite of estimation techniques, particle filters have been identified as a powerful solution due to their non-parametric nature. Yet, in high-dimensional state spaces, these filters face challenges such as 'particle deprivation' which hinders accurate representation of the true posterior distribution. This paper introduces a novel resampling-free particle filter designed to mitigate particle deprivation by forgoing the traditional resampling step. This ensures a broader and more diverse particle set, especially vital in high-dimensional scenarios. Theoretically, our proposed filter is shown to offer a near-accurate representation of the desired posterior distribution in high-dimensional contexts. Empirically, the effectiveness of our approach is underscored through a high-dimensional synthetic state estimation task and a 6D pose estimation derived from videos. We posit that as robotic systems evolve with greater degrees of freedom, particle filters tailored for high-dimensional state spaces will be indispensable.

* Published at ICRA 2024, 7 pages, 5 figures

Via

Access Paper or Ask Questions

Avoiding Catastrophe: Active Dendrites Enable Multi-Task Learning in Dynamic Environments

Dec 31, 2021

Abhiram Iyer, Karan Grewal, Akash Velu, Lucas Oliveira Souza, Jeremy Forest, Subutai Ahmad

Figure 1 for Avoiding Catastrophe: Active Dendrites Enable Multi-Task Learning in Dynamic Environments

Figure 2 for Avoiding Catastrophe: Active Dendrites Enable Multi-Task Learning in Dynamic Environments

Figure 3 for Avoiding Catastrophe: Active Dendrites Enable Multi-Task Learning in Dynamic Environments

Figure 4 for Avoiding Catastrophe: Active Dendrites Enable Multi-Task Learning in Dynamic Environments

Abstract:A key challenge for AI is to build embodied systems that operate in dynamically changing environments. Such systems must adapt to changing task contexts and learn continuously. Although standard deep learning systems achieve state of the art results on static benchmarks, they often struggle in dynamic scenarios. In these settings, error signals from multiple contexts can interfere with one another, ultimately leading to a phenomenon known as catastrophic forgetting. In this article we investigate biologically inspired architectures as solutions to these problems. Specifically, we show that the biophysical properties of dendrites and local inhibitory systems enable networks to dynamically restrict and route information in a context-specific manner. Our key contributions are as follows. First, we propose a novel artificial neural network architecture that incorporates active dendrites and sparse representations into the standard deep learning framework. Next, we study the performance of this architecture on two separate benchmarks requiring task-based adaptation: Meta-World, a multi-task reinforcement learning environment where a robotic agent must learn to solve a variety of manipulation tasks simultaneously; and a continual learning benchmark in which the model's prediction task changes throughout training. Analysis on both benchmarks demonstrates the emergence of overlapping but distinct and sparse subnetworks, allowing the system to fluidly learn multiple tasks with minimal forgetting. Our neural implementation marks the first time a single architecture has achieved competitive results on both multi-task and continual learning settings. Our research sheds light on how biological properties of neurons can inform deep learning systems to address dynamic scenarios that are typically impossible for traditional ANNs to solve.

* 31 pages, 17 figures

Via

Access Paper or Ask Questions

Collision Avoidance Robotics Via Meta-Learning (CARML)

Jul 16, 2020

Abhiram Iyer, Aravind Mahadevan

Figure 1 for Collision Avoidance Robotics Via Meta-Learning (CARML)

Figure 2 for Collision Avoidance Robotics Via Meta-Learning (CARML)

Figure 3 for Collision Avoidance Robotics Via Meta-Learning (CARML)

Figure 4 for Collision Avoidance Robotics Via Meta-Learning (CARML)

Abstract:This paper presents an approach to exploring a multi-objective reinforcement learning problem with Model-Agnostic Meta-Learning. The environment we used consists of a 2D vehicle equipped with a LIDAR sensor. The goal of the environment is to reach some pre-determined target location but also effectively avoid any obstacles it may find along its path. We also compare this approach against a baseline TD3 solution that attempts to solve the same problem.

Via

Access Paper or Ask Questions