Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Namhoon Lee

The Unseen Frontier: Pushing the Limits of LLM Sparsity with Surrogate-Free ADMM

Oct 02, 2025

Kwanhee Lee, Hyeondo Jang, Dongyeop Lee, Dan Alistarh, Namhoon Lee

Abstract:Neural network pruning is a promising technique to mitigate the excessive computational and memory requirements of large language models (LLMs). Despite its promise, however, progress in this area has diminished, as conventional methods are seemingly unable to surpass moderate sparsity levels (50-60%) without severely degrading model accuracy. This work breaks through the current impasse, presenting a principled and effective method called $\texttt{Elsa}$, which achieves extreme sparsity levels of up to 90% while retaining high model fidelity. This is done by identifying several limitations in current practice, all of which can be traced back to their reliance on a surrogate objective formulation. $\texttt{Elsa}$ tackles this issue directly and effectively via standard and well-established constrained optimization techniques based on ADMM. Our extensive experiments across a wide range of models and scales show that $\texttt{Elsa}$ achieves substantial improvements over existing methods; e.g., it achieves 7.8$\times$ less perplexity than the best existing method on LLaMA-2-7B at 90% sparsity. Furthermore, we present $\texttt{Elsa}_{\text{-L}}$, a quantized variant that scales to extremely large models (27B), and establish its theoretical convergence guarantees. These results highlight meaningful progress in advancing the frontier of LLM sparsity, while promising that significant opportunities for further advancement may remain in directions that have so far attracted limited exploration.

* Preprint

Via

Access Paper or Ask Questions

SAFE: Finding Sparse and Flat Minima to Improve Pruning

Jun 07, 2025

Dongyeop Lee, Kwanhee Lee, Jinseok Chung, Namhoon Lee

Abstract:Sparsifying neural networks often suffers from seemingly inevitable performance degradation, and it remains challenging to restore the original performance despite much recent progress. Motivated by recent studies in robust optimization, we aim to tackle this problem by finding subnetworks that are both sparse and flat at the same time. Specifically, we formulate pruning as a sparsity-constrained optimization problem where flatness is encouraged as an objective. We solve it explicitly via an augmented Lagrange dual approach and extend it further by proposing a generalized projection operation, resulting in novel pruning methods called SAFE and its extension, SAFE$^+$. Extensive evaluations on standard image classification and language modeling tasks reveal that SAFE consistently yields sparse networks with improved generalization performance, which compares competitively to well-established baselines. In addition, SAFE demonstrates resilience to noisy data, making it well-suited for real-world conditions.

* ICML 2025

Via

Access Paper or Ask Questions

An Analysis of Concept Bottleneck Models: Measuring, Understanding, and Mitigating the Impact of Noisy Annotations

May 22, 2025

Seonghwan Park, Jueun Mun, Donghyun Oh, Namhoon Lee

Abstract:Concept bottleneck models (CBMs) ensure interpretability by decomposing predictions into human interpretable concepts. Yet the annotations used for training CBMs that enable this transparency are often noisy, and the impact of such corruption is not well understood. In this study, we present the first systematic study of noise in CBMs and show that even moderate corruption simultaneously impairs prediction performance, interpretability, and the intervention effectiveness. Our analysis identifies a susceptible subset of concepts whose accuracy declines far more than the average gap between noisy and clean supervision and whose corruption accounts for most performance loss. To mitigate this vulnerability we propose a two-stage framework. During training, sharpness-aware minimization stabilizes the learning of noise-sensitive concepts. During inference, where clean labels are unavailable, we rank concepts by predictive entropy and correct only the most uncertain ones, using uncertainty as a proxy for susceptibility. Theoretical analysis and extensive ablations elucidate why sharpness-aware training confers robustness and why uncertainty reliably identifies susceptible concepts, providing a principled basis that preserves both interpretability and resilience in the presence of noise.

Via

Access Paper or Ask Questions

ZIP: An Efficient Zeroth-order Prompt Tuning for Black-box Vision-Language Models

Apr 09, 2025

Seonghwan Park, Jaehyeon Jeong, Yongjun Kim, Jaeho Lee, Namhoon Lee

Figure 1 for ZIP: An Efficient Zeroth-order Prompt Tuning for Black-box Vision-Language Models

Figure 2 for ZIP: An Efficient Zeroth-order Prompt Tuning for Black-box Vision-Language Models

Figure 3 for ZIP: An Efficient Zeroth-order Prompt Tuning for Black-box Vision-Language Models

Figure 4 for ZIP: An Efficient Zeroth-order Prompt Tuning for Black-box Vision-Language Models

Abstract:Recent studies have introduced various approaches for prompt-tuning black-box vision-language models, referred to as black-box prompt-tuning (BBPT). While BBPT has demonstrated considerable potential, it is often found that many existing methods require an excessive number of queries (i.e., function evaluations), which poses a significant challenge in real-world scenarios where the number of allowed queries is limited. To tackle this issue, we propose Zeroth-order Intrinsic-dimensional Prompt-tuning (ZIP), a novel approach that enables efficient and robust prompt optimization in a purely black-box setting. The key idea of ZIP is to reduce the problem dimensionality and the variance of zeroth-order gradient estimates, such that the training is done fast with far less queries. We achieve this by re-parameterizing prompts in low-rank representations and designing intrinsic-dimensional clipping of estimated gradients. We evaluate ZIP on 13+ vision-language tasks in standard benchmarks and show that it achieves an average improvement of approximately 6% in few-shot accuracy and 48% in query efficiency compared to the best-performing alternative BBPT methods, establishing a new state of the art. Our ablation analysis further shows that the proposed clipping mechanism is robust and nearly optimal, without the need to manually select the clipping threshold, matching the result of expensive hyperparameter search.

* ICLR 2025

Via

Access Paper or Ask Questions

SASSHA: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation

Feb 25, 2025

Dahun Shin, Dongyeop Lee, Jinseok Chung, Namhoon Lee

Figure 1 for SASSHA: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation

Figure 2 for SASSHA: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation

Figure 3 for SASSHA: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation

Figure 4 for SASSHA: Sharpness-aware Adaptive Second-order Optimization with Stable Hessian Approximation

Abstract:Approximate second-order optimization methods often exhibit poorer generalization compared to first-order approaches. In this work, we look into this issue through the lens of the loss landscape and find that existing second-order methods tend to converge to sharper minima compared to SGD. In response, we propose Sassha, a novel second-order method designed to enhance generalization by explicitly reducing sharpness of the solution, while stabilizing the computation of approximate Hessians along the optimization trajectory. In fact, this sharpness minimization scheme is crafted also to accommodate lazy Hessian updates, so as to secure efficiency besides flatness. To validate its effectiveness, we conduct a wide range of standard deep learning experiments where Sassha demonstrates its outstanding generalization performance that is comparable to, and mostly better than, other methods. We provide a comprehensive set of analyses including convergence, robustness, stability, efficiency, and cost.

Via

Access Paper or Ask Questions

Rethinking Pruning Large Language Models: Benefits and Pitfalls of Reconstruction Error Minimization

Jun 21, 2024

Sungbin Shin, Wonpyo Park, Jaeho Lee, Namhoon Lee

Figure 1 for Rethinking Pruning Large Language Models: Benefits and Pitfalls of Reconstruction Error Minimization

Figure 2 for Rethinking Pruning Large Language Models: Benefits and Pitfalls of Reconstruction Error Minimization

Figure 3 for Rethinking Pruning Large Language Models: Benefits and Pitfalls of Reconstruction Error Minimization

Figure 4 for Rethinking Pruning Large Language Models: Benefits and Pitfalls of Reconstruction Error Minimization

Abstract:This work suggests fundamentally rethinking the current practice of pruning large language models (LLMs). The way it is done is by divide and conquer: split the model into submodels, sequentially prune them, and reconstruct predictions of the dense counterparts on small calibration data one at a time; the final model is obtained simply by putting the resulting sparse submodels together. While this approach enables pruning under memory constraints, it generates high reconstruction errors. In this work, we first present an array of reconstruction techniques that can significantly reduce this error by more than $90\%$. Unwittingly, however, we discover that minimizing reconstruction error is not always ideal and can overfit the given calibration data, resulting in rather increased language perplexity and poor performance at downstream tasks. We find out that a strategy of self-generating calibration data can mitigate this trade-off between reconstruction and generalization, suggesting new directions in the presence of both benefits and pitfalls of reconstruction for pruning LLMs.

Via

Access Paper or Ask Questions

The Effects of Overparameterization on Sharpness-aware Minimization: An Empirical and Theoretical Analysis

Nov 29, 2023

Sungbin Shin, Dongyeop Lee, Maksym Andriushchenko, Namhoon Lee

Abstract:Training an overparameterized neural network can yield minimizers of the same level of training loss and yet different generalization capabilities. With evidence that indicates a correlation between sharpness of minima and their generalization errors, increasing efforts have been made to develop an optimization method to explicitly find flat minima as more generalizable solutions. This sharpness-aware minimization (SAM) strategy, however, has not been studied much yet as to how overparameterization can actually affect its behavior. In this work, we analyze SAM under varying degrees of overparameterization and present both empirical and theoretical results that suggest a critical influence of overparameterization on SAM. Specifically, we first use standard techniques in optimization to prove that SAM can achieve a linear convergence rate under overparameterization in a stochastic setting. We also show that the linearly stable minima found by SAM are indeed flatter and have more uniformly distributed Hessian moments compared to those of SGD. These results are corroborated with our experiments that reveal a consistent trend that the generalization improvement made by SAM continues to increase as the model becomes more overparameterized. We further present that sparsity can open up an avenue for effective overparameterization in practice.

Via

Access Paper or Ask Questions

FedFwd: Federated Learning without Backpropagation

Sep 03, 2023

Seonghwan Park, Dahun Shin, Jinseok Chung, Namhoon Lee

Abstract:In federated learning (FL), clients with limited resources can disrupt the training efficiency. A potential solution to this problem is to leverage a new learning procedure that does not rely on backpropagation (BP). We present a novel approach to FL called FedFwd that employs a recent BP-free method by Hinton (2022), namely the Forward Forward algorithm, in the local training process. FedFwd can reduce a significant amount of computations for updating parameters by performing layer-wise local updates, and therefore, there is no need to store all intermediate activation values during training. We conduct various experiments to evaluate FedFwd on standard datasets including MNIST and CIFAR-10, and show that it works competitively to other BP-dependent FL methods.

* ICML 2023 Workshop (Federated Learning and Analytics in Practice: Algorithms, Systems, Applications, and Opportunities)

Via

Access Paper or Ask Questions

JaxPruner: A concise library for sparsity research

May 02, 2023

Joo Hyung Lee, Wonpyo Park, Nicole Mitchell, Jonathan Pilault, Johan Obando-Ceron, Han-Byul Kim, Namhoon Lee, Elias Frantar, Yun Long, Amir Yazdanbakhsh(+15 more)

Figure 1 for JaxPruner: A concise library for sparsity research

Figure 2 for JaxPruner: A concise library for sparsity research

Figure 3 for JaxPruner: A concise library for sparsity research

Abstract:This paper introduces JaxPruner, an open-source JAX-based pruning and sparse training library for machine learning research. JaxPruner aims to accelerate research on sparse neural networks by providing concise implementations of popular pruning and sparse training algorithms with minimal memory and latency overhead. Algorithms implemented in JaxPruner use a common API and work seamlessly with the popular optimization library Optax, which, in turn, enables easy integration with existing JAX based libraries. We demonstrate this ease of integration by providing examples in four different codebases: Scenic, t5x, Dopamine and FedJAX and provide baseline experiments on popular benchmarks.

* Jaxpruner is hosted at http://github.com/google-research/jaxpruner

Via

Access Paper or Ask Questions

A Closer Look at the Intervention Procedure of Concept Bottleneck Models

Feb 28, 2023

Sungbin Shin, Yohan Jo, Sungsoo Ahn, Namhoon Lee

Abstract:Concept bottleneck models (CBMs) are a class of interpretable neural network models that predict the target response of a given input based on its high-level concepts. Unlike the standard end-to-end models, CBMs enable domain experts to intervene on the predicted concepts and rectify any mistakes at test time, so that more accurate task predictions can be made at the end. While such intervenability provides a powerful avenue of control, many aspects of the intervention procedure remain rather unexplored. In this work, we develop various ways of selecting intervening concepts to improve the intervention effectiveness and conduct an array of in-depth analyses as to how they evolve under different circumstances. Specifically, we find that an informed intervention strategy can reduce the task error more than ten times compared to the current baseline under the same amount of intervention counts in realistic settings, and yet, this can vary quite significantly when taking into account different intervention granularity. We verify our findings through comprehensive evaluations, not only on the standard real datasets, but also on synthetic datasets that we generate based on a set of different causal graphs. We further discover some major pitfalls of the current practices which, without a proper addressing, raise concerns on reliability and fairness of the intervention procedure.

* 18 pages, 30 figures. Previous version accepted at NeurIPS 2022 Workshop on Trustworthy and Socially Responsible Machine Learning

Via

Access Paper or Ask Questions