Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Joachim Schaeffer

Transformers Don't Need LayerNorm at Inference Time: Scaling LayerNorm Removal to GPT-2 XL and the Implications for Mechanistic Interpretability

Jul 03, 2025

Luca Baroni, Galvin Khara, Joachim Schaeffer, Marat Subkhankulov, Stefan Heimersheim

Abstract:Layer-wise normalization (LN) is an essential component of virtually all transformer-based large language models. While its effects on training stability are well documented, its role at inference time is poorly understood. Additionally, LN layers hinder mechanistic interpretability by introducing additional nonlinearities and increasing the interconnectedness of individual model components. Here, we show that all LN layers can be removed from every GPT-2 model with only a small increase in validation loss (e.g. +0.03 cross-entropy loss for GPT-2 XL). Thus, LN cannot play a substantial role in language modeling. We find that the amount of fine-tuning data needed for LN removal grows sublinearly with model parameters, suggesting scaling to larger models is feasible. We release a suite of LN-free GPT-2 models on Hugging Face. Furthermore, we test interpretability techniques on LN-free models. Direct logit attribution now gives the exact direct effect of individual components, while the accuracy of attribution patching does not significantly improve. We also confirm that GPT-2's "confidence neurons" are inactive in the LN-free models. Our work clarifies the role of LN layers in language modeling, showing that GPT-2-class models can function without LN layers. We hope that our LN-free analogs of the GPT-2 family of models will enable more precise interpretability research and improve our understanding of language models.

Via

Access Paper or Ask Questions

Diagnostic-free onboard battery health assessment

Mar 10, 2025

Yunhong Che, Vivek N. Lam, Jinwook Rhyu, Joachim Schaeffer, Minsu Kim, Martin Z. Bazant, William C. Chueh, Richard D. Braatz

Abstract:Diverse usage patterns induce complex and variable aging behaviors in lithium-ion batteries, complicating accurate health diagnosis and prognosis. Separate diagnostic cycles are often used to untangle the battery's current state of health from prior complex aging patterns. However, these same diagnostic cycles alter the battery's degradation trajectory, are time-intensive, and cannot be practically performed in onboard applications. In this work, we leverage portions of operational measurements in combination with an interpretable machine learning model to enable rapid, onboard battery health diagnostics and prognostics without offline diagnostic testing and the requirement of historical data. We integrate mechanistic constraints within an encoder-decoder architecture to extract electrode states in a physically interpretable latent space and enable improved reconstruction of the degradation path. The health diagnosis model framework can be flexibly applied across diverse application interests with slight fine-tuning. We demonstrate the versatility of this model framework by applying it to three battery-cycling datasets consisting of 422 cells under different operating conditions, highlighting the utility of an interpretable diagnostic-free, onboard battery diagnosis and prognosis model.

* 25 pages

Via

Access Paper or Ask Questions

Interpretation of High-Dimensional Regression Coefficients by Comparison with Linearized Compressing Features

Nov 18, 2024

Joachim Schaeffer, Jinwook Rhyu, Robin Droop, Rolf Findeisen, Richard Braatz

Abstract:Linear regression is often deemed inherently interpretable; however, challenges arise for high-dimensional data. We focus on further understanding how linear regression approximates nonlinear responses from high-dimensional functional data, motivated by predicting cycle life for lithium-ion batteries. We develop a linearization method to derive feature coefficients, which we compare with the closest regression coefficients of the path of regression solutions. We showcase the methods on battery data case studies where a single nonlinear compressing feature, $g\colon \mathbb{R}^p \to \mathbb{R}$, is used to construct a synthetic response, $\mathbf{y} \in \mathbb{R}$. This unifying view of linear regression and compressing features for high-dimensional functional data helps to understand (1) how regression coefficients are shaped in the highly regularized domain and how they relate to linearized feature coefficients and (2) how the shape of regression coefficients changes as a function of regularization to approximate nonlinear responses by exploiting local structures.

* This manuscript is a short communication. 9 pages, 4 figures

Via

Access Paper or Ask Questions

Systematic Feature Design for Cycle Life Prediction of Lithium-Ion Batteries During Formation

Oct 09, 2024

Jinwook Rhyu, Joachim Schaeffer, Michael L. Li, Xiao Cui, William C. Chueh, Martin Z. Bazant, Richard D. Braatz

Figure 1 for Systematic Feature Design for Cycle Life Prediction of Lithium-Ion Batteries During Formation

Figure 2 for Systematic Feature Design for Cycle Life Prediction of Lithium-Ion Batteries During Formation

Figure 3 for Systematic Feature Design for Cycle Life Prediction of Lithium-Ion Batteries During Formation

Figure 4 for Systematic Feature Design for Cycle Life Prediction of Lithium-Ion Batteries During Formation

Abstract:Optimization of the formation step in lithium-ion battery manufacturing is challenging due to limited physical understanding of solid electrolyte interphase formation and the long testing time (~100 days) for cells to reach the end of life. We propose a systematic feature design framework that requires minimal domain knowledge for accurate cycle life prediction during formation. Two simple Q(V) features designed from our framework, extracted from formation data without any additional diagnostic cycles, achieved a median of 9.20% error for cycle life prediction, outperforming thousands of autoML models using pre-defined features. We attribute the strong performance of our designed features to their physical origins - the voltage ranges identified by our framework capture the effects of formation temperature and microscopic particle resistance heterogeneity. By designing highly interpretable features, our approach can accelerate formation research, leveraging the interplay between data-driven feature design and mechanistic understanding.

* Main: 27 pages, 6 figures. SI: 13 pages, 9 figures

Via

Access Paper or Ask Questions

Safe Learning-Based Optimization of Model Predictive Control: Application to Battery Fast-Charging

Oct 07, 2024

Sebastian Hirt, Andreas Höhl, Johannes Pohlodek, Joachim Schaeffer, Maik Pfefferkorn, Richard D. Braatz, Rolf Findeisen

Abstract:Model predictive control (MPC) is a powerful tool for controlling complex nonlinear systems under constraints, but often struggles with model uncertainties and the design of suitable cost functions. To address these challenges, we discuss an approach that integrates MPC with safe Bayesian optimization to optimize long-term closed-loop performance despite significant model-plant mismatches. By parameterizing the MPC stage cost function using a radial basis function network, we employ Bayesian optimization as a multi-episode learning strategy to tune the controller without relying on precise system models. This method mitigates conservativeness introduced by overly cautious soft constraints in the MPC cost function and provides probabilistic safety guarantees during learning, ensuring that safety-critical constraints are met with high probability. As a practical application, we apply our approach to fast charging of lithium-ion batteries, a challenging task due to the complicated battery dynamics and strict safety requirements, subject to the requirement to be implementable in real time. Simulation results demonstrate that, in the context of model-plant mismatch, our method reduces charging times compared to traditional MPC methods while maintaining safety. This work extends previous research by emphasizing closed-loop constraint satisfaction and offers a promising solution for enhancing performance in systems where model uncertainties and safety are critical concerns.

* 7 pages, 4 figures, submitted to ACC 2025

Via

Access Paper or Ask Questions

Lithium-Ion Battery System Health Monitoring and Fault Analysis from Field Data Using Gaussian Processes

Jun 27, 2024

Joachim Schaeffer, Eric Lenz, Duncan Gulla, Martin Z. Bazant, Richard D. Braatz, Rolf Findeisen

Figure 1 for Lithium-Ion Battery System Health Monitoring and Fault Analysis from Field Data Using Gaussian Processes

Figure 2 for Lithium-Ion Battery System Health Monitoring and Fault Analysis from Field Data Using Gaussian Processes

Figure 3 for Lithium-Ion Battery System Health Monitoring and Fault Analysis from Field Data Using Gaussian Processes

Figure 4 for Lithium-Ion Battery System Health Monitoring and Fault Analysis from Field Data Using Gaussian Processes

Abstract:Health monitoring, fault analysis, and detection are critical for the safe and sustainable operation of battery systems. We apply Gaussian process resistance models on lithium iron phosphate battery field data to effectively separate the time-dependent and operating point-dependent resistance. The data set contains 29 battery systems returned to the manufacturer for warranty, each with eight cells in series, totaling 232 cells and 131 million data rows. We develop probabilistic fault detection rules using recursive spatiotemporal Gaussian processes. These processes allow the quick processing of over a million data points, enabling advanced online monitoring and furthering the understanding of battery pack failure in the field. The analysis underlines that often, only a single cell shows abnormal behavior or a knee point, consistent with weakest-link failure for cells connected in series, amplified by local resistive heating. The results further the understanding of how batteries degrade and fail in the field and demonstrate the potential of efficient online monitoring based on data. We open-source the code and publish the large data set upon completion of the review of this article.

Via

Access Paper or Ask Questions

Learning Model Predictive Control Parameters via Bayesian Optimization for Battery Fast Charging

Apr 09, 2024

Sebastian Hirt, Andreas Höhl, Joachim Schaeffer, Johannes Pohlodek, Richard D. Braatz, Rolf Findeisen

Figure 1 for Learning Model Predictive Control Parameters via Bayesian Optimization for Battery Fast Charging

Figure 2 for Learning Model Predictive Control Parameters via Bayesian Optimization for Battery Fast Charging

Figure 3 for Learning Model Predictive Control Parameters via Bayesian Optimization for Battery Fast Charging

Figure 4 for Learning Model Predictive Control Parameters via Bayesian Optimization for Battery Fast Charging

Abstract:Tuning parameters in model predictive control (MPC) presents significant challenges, particularly when there is a notable discrepancy between the controller's predictions and the actual behavior of the closed-loop plant. This mismatch may stem from factors like substantial model-plant differences, limited prediction horizons that do not cover the entire time of interest, or unforeseen system disturbances. Such mismatches can jeopardize both performance and safety, including constraint satisfaction. Traditional methods address this issue by modifying the finite horizon cost function to better reflect the overall operational cost, learning parts of the prediction model from data, or implementing robust MPC strategies, which might be either computationally intensive or overly cautious. As an alternative, directly optimizing or learning the controller parameters to enhance closed-loop performance has been proposed. We apply Bayesian optimization for efficient learning of unknown model parameters and parameterized constraint backoff terms, aiming to improve closed-loop performance of battery fast charging. This approach establishes a hierarchical control framework where Bayesian optimization directly fine-tunes closed-loop behavior towards a global and long-term objective, while MPC handles lower-level, short-term control tasks. For lithium-ion battery fast charging, we show that the learning approach not only ensures safe operation but also maximizes closed-loop performance. This includes maintaining the battery's operation below its maximum terminal voltage and reducing charging times, all achieved using a standard nominal MPC model with a short horizon and notable initial model-plant mismatch.

* 6 pages, 5 figures, accepted for ADCHEM 2024

Via

Access Paper or Ask Questions

Cycle Life Prediction for Lithium-ion Batteries: Machine Learning and More

Apr 05, 2024

Joachim Schaeffer, Giacomo Galuppini, Jinwook Rhyu, Patrick A. Asinger, Robin Droop, Rolf Findeisen, Richard D. Braatz

Figure 1 for Cycle Life Prediction for Lithium-ion Batteries: Machine Learning and More

Figure 2 for Cycle Life Prediction for Lithium-ion Batteries: Machine Learning and More

Figure 3 for Cycle Life Prediction for Lithium-ion Batteries: Machine Learning and More

Abstract:Batteries are dynamic systems with complicated nonlinear aging, highly dependent on cell design, chemistry, manufacturing, and operational conditions. Prediction of battery cycle life and estimation of aging states is important to accelerate battery R&D, testing, and to further the understanding of how batteries degrade. Beyond testing, battery management systems rely on real-time models and onboard diagnostics and prognostics for safe operation. Estimating the state of health and remaining useful life of a battery is important to optimize performance and use resources optimally. This tutorial begins with an overview of first-principles, machine learning, and hybrid battery models. Then, a typical pipeline for the development of interpretable machine learning models is explained and showcased for cycle life prediction from laboratory testing data. We highlight the challenges of machine learning models, motivating the incorporation of physics in hybrid modeling approaches, which are needed to decipher the aging trajectory of batteries but require more data and further work on the physics of battery degradation. The tutorial closes with a discussion on generalization and further research directions.

* 6 pages, 3 figures, accepted for ACC 2024

Via

Access Paper or Ask Questions

Interpretation of High-Dimensional Linear Regression: Effects of Nullspace and Regularization Demonstrated on Battery Data

Sep 06, 2023

Joachim Schaeffer, Eric Lenz, William C. Chueh, Martin Z. Bazant, Rolf Findeisen, Richard D. Braatz

Figure 1 for Interpretation of High-Dimensional Linear Regression: Effects of Nullspace and Regularization Demonstrated on Battery Data

Figure 2 for Interpretation of High-Dimensional Linear Regression: Effects of Nullspace and Regularization Demonstrated on Battery Data

Figure 3 for Interpretation of High-Dimensional Linear Regression: Effects of Nullspace and Regularization Demonstrated on Battery Data

Figure 4 for Interpretation of High-Dimensional Linear Regression: Effects of Nullspace and Regularization Demonstrated on Battery Data

Abstract:High-dimensional linear regression is important in many scientific fields. This article considers discrete measured data of underlying smooth latent processes, as is often obtained from chemical or biological systems. Interpretation in high dimensions is challenging because the nullspace and its interplay with regularization shapes regression coefficients. The data's nullspace contains all coefficients that satisfy $\mathbf{Xw}=\mathbf{0}$, thus allowing very different coefficients to yield identical predictions. We developed an optimization formulation to compare regression coefficients and coefficients obtained by physical engineering knowledge to understand which part of the coefficient differences are close to the nullspace. This nullspace method is tested on a synthetic example and lithium-ion battery data. The case studies show that regularization and z-scoring are design choices that, if chosen corresponding to prior physical knowledge, lead to interpretable regression results. Otherwise, the combination of the nullspace and regularization hinders interpretability and can make it impossible to obtain regression coefficients close to the true coefficients when there is a true underlying linear model. Furthermore, we demonstrate that regression methods that do not produce coefficients orthogonal to the nullspace, such as fused lasso, can improve interpretability. In conclusion, the insights gained from the nullspace perspective help to make informed design choices for building regression models on high-dimensional data and reasoning about potential underlying linear models, which are important for system optimization and improving scientific understanding.

* Manuscript: 14 pages, 7 figures; Supplementary Information: 4 pages, 2 figures; Code available: https://github.com/JoachimSchaeffer/HDRegAnalytics

Via

Access Paper or Ask Questions

Machine learning benchmarks for the classification of equivalent circuit models from solid-state electrochemical impedance spectra

Feb 07, 2023

Joachim Schaeffer, Paul Gasper, Esteban Garcia-Tamayo, Raymond Gasper, Masaki Adachi, Juan Pablo Gaviria-Cardona, Simon Montoya-Bedoya, Anoushka Bhutani, Andrew Schiek, Rhys Goodall(+3 more)

Abstract:Analysis of Electrochemical Impedance Spectroscopy (EIS) data for electrochemical systems often consists of defining an Equivalent Circuit Model (ECM) using expert knowledge and then optimizing the model parameters to deconvolute various resistance, capacitive, inductive, or diffusion responses. For small data sets, this procedure can be conducted manually; however, it is not feasible to manually define a proper ECM for extensive data sets with a wide range of EIS responses. Automatic identification of an ECM would substantially accelerate the analysis of large sets of EIS data. Here, we showcase machine learning methods developed during the BatteryDEV hackathon to classify the ECMs of 9,300 EIS measurements provided by QuantumScape. The best-performing approach is a gradient-boosted tree model utilizing a library to automatically generate features, followed by a random forest model using the raw spectral data. A convolutional neural network using boolean images of Nyquist representations is presented as an alternative, although it achieves a lower accuracy. We publish the data and open source the associated code. The approaches described in this article can serve as benchmarks for further studies. A key remaining challenge is that the labels contain uncertainty and human bias, underlined by the performance of the trained models.

* Manuscript: 16 pages, 8 figures; Supplementary Information: 7 pages, 3 figures

Via

Access Paper or Ask Questions