Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Michael Kleinman

Towards a theory of learning dynamics in deep state space models

Jul 10, 2024

Jakub Smékal, Jimmy T. H. Smith, Michael Kleinman, Dan Biderman, Scott W. Linderman

Figure 1 for Towards a theory of learning dynamics in deep state space models

Abstract:State space models (SSMs) have shown remarkable empirical performance on many long sequence modeling tasks, but a theoretical understanding of these models is still lacking. In this work, we study the learning dynamics of linear SSMs to understand how covariance structure in data, latent state size, and initialization affect the evolution of parameters throughout learning with gradient descent. We show that focusing on the learning dynamics in the frequency domain affords analytical solutions under mild assumptions, and we establish a link between one-dimensional SSMs and the dynamics of deep linear feed-forward networks. Finally, we analyze how latent state over-parameterization affects convergence time and describe future work in extending our results to the study of deep SSMs with nonlinear connections. This work is a step toward a theory of learning dynamics in deep state space models.

Via

Access Paper or Ask Questions

Voice EHR: Introducing Multimodal Audio Data for Health

Apr 02, 2024

James Anibal, Hannah Huth, Ming Li, Lindsey Hazen, Yen Minh Lam, Nguyen Thi Thu Hang, Michael Kleinman, Shelley Ost, Christopher Jackson, Laura Sprabery(+17 more)

Figure 1 for Voice EHR: Introducing Multimodal Audio Data for Health

Figure 2 for Voice EHR: Introducing Multimodal Audio Data for Health

Figure 3 for Voice EHR: Introducing Multimodal Audio Data for Health

Figure 4 for Voice EHR: Introducing Multimodal Audio Data for Health

Abstract:Large AI models trained on audio data may have the potential to rapidly classify patients, enhancing medical decision-making and potentially improving outcomes through early detection. Existing technologies depend on limited datasets using expensive recording equipment in high-income, English-speaking countries. This challenges deployment in resource-constrained, high-volume settings where audio data may have a profound impact. This report introduces a novel data type and a corresponding collection system that captures health data through guided questions using only a mobile/web application. This application ultimately results in an audio electronic health record (voice EHR) which may contain complex biomarkers of health from conventional voice/respiratory features, speech patterns, and language with semantic meaning - compensating for the typical limitations of unimodal clinical datasets. This report introduces a consortium of partners for global work, presents the application used for data collection, and showcases the potential of informative voice EHR to advance the scalability and diversity of audio AI.

* 18 pages, 2 figures, 7 tables

Via

Access Paper or Ask Questions

Critical Learning Periods Emerge Even in Deep Linear Networks

Aug 23, 2023

Michael Kleinman, Alessandro Achille, Stefano Soatto

Abstract:Critical learning periods are periods early in development where temporary sensory deficits can have a permanent effect on behavior and learned representations. Despite the radical differences between biological and artificial networks, critical learning periods have been empirically observed in both systems. This suggests that critical periods may be fundamental to learning and not an accident of biology. Yet, why exactly critical periods emerge in deep networks is still an open question, and in particular it is unclear whether the critical periods observed in both systems depend on particular architectural or optimization details. To isolate the key underlying factors, we focus on deep linear network models, and show that, surprisingly, such networks also display much of the behavior seen in biology and artificial networks, while being amenable to analytical treatment. We show that critical periods depend on the depth of the model and structure of the data distribution. We also show analytically and in simulations that the learning of features is tied to competition between sources. Finally, we extend our analysis to multi-task learning to show that pre-training on certain tasks can damage the transfer performance on new tasks, and show how this depends on the relationship between tasks and the duration of the pre-training stage. To the best of our knowledge, our work provides the first analytically tractable model that sheds light into why critical learning periods emerge in biological and artificial networks.

Via

Access Paper or Ask Questions

Critical Learning Periods for Multisensory Integration in Deep Networks

Oct 06, 2022

Michael Kleinman, Alessandro Achille, Stefano Soatto

Figure 1 for Critical Learning Periods for Multisensory Integration in Deep Networks

Figure 2 for Critical Learning Periods for Multisensory Integration in Deep Networks

Figure 3 for Critical Learning Periods for Multisensory Integration in Deep Networks

Figure 4 for Critical Learning Periods for Multisensory Integration in Deep Networks

Abstract:We show that the ability of a neural network to integrate information from diverse sources hinges critically on being exposed to properly correlated signals during the early phases of training. Interfering with the learning process during this initial stage can permanently impair the development of a skill, both in artificial and biological systems where the phenomenon is known as critical learning period. We show that critical periods arise from the complex and unstable early transient dynamics, which are decisive of final performance of the trained system and their learned representations. This evidence challenges the view, engendered by analysis of wide and shallow networks, that early learning dynamics of neural networks are simple, akin to those of a linear model. Indeed, we show that even deep linear networks exhibit critical learning periods for multi-source integration, while shallow networks do not. To better understand how the internal representations change according to disturbances or sensory deficits, we introduce a new measure of source sensitivity, which allows us to track the inhibition and integration of sources during training. Our analysis of inhibition suggests cross-source reconstruction as a natural auxiliary training objective, and indeed we show that architectures trained with cross-sensor reconstruction objectives are remarkably more resilient to critical periods. Our findings suggest that the recent success in self-supervised multi-modal training compared to previous supervised efforts may be in part due to more robust learning dynamics and not solely due to better architectures and/or more data.

Via

Access Paper or Ask Questions

Gacs-Korner Common Information Variational Autoencoder

May 24, 2022

Michael Kleinman, Alessandro Achille, Stefano Soatto, Jonathan Kao

Figure 1 for Gacs-Korner Common Information Variational Autoencoder

Figure 2 for Gacs-Korner Common Information Variational Autoencoder

Figure 3 for Gacs-Korner Common Information Variational Autoencoder

Figure 4 for Gacs-Korner Common Information Variational Autoencoder

Abstract:We propose a notion of common information that allows one to quantify and separate the information that is shared between two random variables from the information that is unique to each. Our notion of common information is a variational relaxation of the G\'acs-K\"orner common information, which we recover as a special case, but is more amenable to optimization and can be approximated empirically using samples from the underlying distribution. We then provide a method to partition and quantify the common and unique information using a simple modification of a traditional variational auto-encoder. Empirically, we demonstrate that our formulation allows us to learn semantically meaningful common and unique factors of variation even on high-dimensional data such as images and videos. Moreover, on datasets where ground-truth latent factors are known, we show that we can accurately quantify the common information between the random variables. Additionally, we show that the auto-encoder that we learn recovers semantically meaningful disentangled factors of variation, even though we do not explicitly optimize for it.

Via

Access Paper or Ask Questions

Usable Information and Evolution of Optimal Representations During Training

Oct 06, 2020

Michael Kleinman, Daksh Idnani, Alessandro Achille, Jonathan C. Kao

Figure 1 for Usable Information and Evolution of Optimal Representations During Training

Figure 2 for Usable Information and Evolution of Optimal Representations During Training

Figure 3 for Usable Information and Evolution of Optimal Representations During Training

Figure 4 for Usable Information and Evolution of Optimal Representations During Training

Abstract:We introduce a notion of usable information contained in the representation learned by a deep network, and use it to study how optimal representations for the task emerge during training, and how they adapt to different tasks. We use this to characterize the transient dynamics of deep neural networks on perceptual decision-making tasks inspired by neuroscience literature. In particular, we show that both the random initialization and the implicit regularization from Stochastic Gradient Descent play an important role in learning minimal sufficient representations for the task. If the network is not randomly initialized, we show that the training may not recover an optimal representation, increasing the chance of overfitting.

Via

Access Paper or Ask Questions