Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Marco Federici

HadaNorm: Diffusion Transformer Quantization through Mean-Centered Transformations

Jun 11, 2025

Marco Federici, Riccardo Del Chiaro, Boris van Breugel, Paul Whatmough, Markus Nagel

Abstract:Diffusion models represent the cutting edge in image generation, but their high memory and computational demands hinder deployment on resource-constrained devices. Post-Training Quantization (PTQ) offers a promising solution by reducing the bitwidth of matrix operations. However, standard PTQ methods struggle with outliers, and achieving higher compression often requires transforming model weights and activations before quantization. In this work, we propose HadaNorm, a novel linear transformation that extends existing approaches and effectively mitigates outliers by normalizing activations feature channels before applying Hadamard transformations, enabling more aggressive activation quantization. We demonstrate that HadaNorm consistently reduces quantization error across the various components of transformer blocks, achieving superior efficiency-performance trade-offs when compared to state-of-the-art methods.

* 4 Pages, 5 Figures

Via

Access Paper or Ask Questions

Bridge the Inference Gaps of Neural Processes via Expectation Maximization

Jan 04, 2025

Qi Wang, Marco Federici, Herke van Hoof

Abstract:The neural process (NP) is a family of computationally efficient models for learning distributions over functions. However, it suffers from under-fitting and shows suboptimal performance in practice. Researchers have primarily focused on incorporating diverse structural inductive biases, \textit{e.g.} attention or convolution, in modeling. The topic of inference suboptimality and an analysis of the NP from the optimization objective perspective has hardly been studied in earlier work. To fix this issue, we propose a surrogate objective of the target log-likelihood of the meta dataset within the expectation maximization framework. The resulting model, referred to as the Self-normalized Importance weighted Neural Process (SI-NP), can learn a more accurate functional prior and has an improvement guarantee concerning the target log-likelihood. Experimental results show the competitive performance of SI-NP over other NPs objectives and illustrate that structural inductive biases, such as attention modules, can also augment our method to achieve SOTA performance. Our code is available at \url{https://github.com/hhq123gogogo/SI_NPs}.

* ICLR2023

Via

Access Paper or Ask Questions

Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking

Dec 02, 2024

Marco Federici, Davide Belli, Mart van Baalen, Amir Jalalirad, Andrii Skliar, Bence Major, Markus Nagel, Paul Whatmough

Figure 1 for Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking

Figure 2 for Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking

Figure 3 for Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking

Figure 4 for Efficient LLM Inference using Dynamic Input Pruning and Cache-Aware Masking

Abstract:While mobile devices provide ever more compute power, improvements in DRAM bandwidth are much slower. This is unfortunate for large language model (LLM) token generation, which is heavily memory-bound. Previous work has proposed to leverage natural dynamic activation sparsity in ReLU-activated LLMs to reduce effective DRAM bandwidth per token. However, more recent LLMs use SwiGLU instead of ReLU, which result in little inherent sparsity. While SwiGLU activations can be pruned based on magnitude, the resulting sparsity patterns are difficult to predict, rendering previous approaches ineffective. To circumvent this issue, our work introduces Dynamic Input Pruning (DIP): a predictor-free dynamic sparsification approach, which preserves accuracy with minimal fine-tuning. DIP can further use lightweight LoRA adapters to regain some performance lost during sparsification. Lastly, we describe a novel cache-aware masking strategy, which considers the cache state and activation magnitude to further increase cache hit rate, improving LLM token rate on mobile devices. DIP outperforms other methods in terms of accuracy, memory and throughput trade-offs across simulated hardware settings. On Phi-3-Medium, DIP achieves a 46% reduction in memory and 40% increase in throughput with $<$ 0.1 loss in perplexity.

* Main Text: 10 pages, 11 figures. Appendix: 3 pages, 3 figures

Via

Access Paper or Ask Questions

Simulation-based Inference with the Generalized Kullback-Leibler Divergence

Oct 03, 2023

Benjamin Kurt Miller, Marco Federici, Christoph Weniger, Patrick Forré

Figure 1 for Simulation-based Inference with the Generalized Kullback-Leibler Divergence

Figure 2 for Simulation-based Inference with the Generalized Kullback-Leibler Divergence

Figure 3 for Simulation-based Inference with the Generalized Kullback-Leibler Divergence

Figure 4 for Simulation-based Inference with the Generalized Kullback-Leibler Divergence

Abstract:In Simulation-based Inference, the goal is to solve the inverse problem when the likelihood is only known implicitly. Neural Posterior Estimation commonly fits a normalized density estimator as a surrogate model for the posterior. This formulation cannot easily fit unnormalized surrogates because it optimizes the Kullback-Leibler divergence. We propose to optimize a generalized Kullback-Leibler divergence that accounts for the normalization constant in unnormalized distributions. The objective recovers Neural Posterior Estimation when the model class is normalized and unifies it with Neural Ratio Estimation, combining both into a single objective. We investigate a hybrid model that offers the best of both worlds by learning a normalized base distribution and a learned ratio. We also present benchmark results.

* Accepted at Synergy of Scientific and Machine Learning Modeling ICML 2023 Workshop https://syns-ml.github.io/2023/contributions/

Via

Access Paper or Ask Questions

Latent Representation and Simulation of Markov Processes via Time-Lagged Information Bottleneck

Sep 13, 2023

Marco Federici, Patrick Forré, Ryota Tomioka, Bastiaan S. Veeling

Figure 1 for Latent Representation and Simulation of Markov Processes via Time-Lagged Information Bottleneck

Figure 2 for Latent Representation and Simulation of Markov Processes via Time-Lagged Information Bottleneck

Figure 3 for Latent Representation and Simulation of Markov Processes via Time-Lagged Information Bottleneck

Figure 4 for Latent Representation and Simulation of Markov Processes via Time-Lagged Information Bottleneck

Abstract:Markov processes are widely used mathematical models for describing dynamic systems in various fields. However, accurately simulating large-scale systems at long time scales is computationally expensive due to the short time steps required for accurate integration. In this paper, we introduce an inference process that maps complex systems into a simplified representational space and models large jumps in time. To achieve this, we propose Time-lagged Information Bottleneck (T-IB), a principled objective rooted in information theory, which aims to capture relevant temporal features while discarding high-frequency information to simplify the simulation task and minimize the inference error. Our experiments demonstrate that T-IB learns information-optimal representations for accurately modeling the statistical properties and dynamics of the original process at a selected time lag, outperforming existing time-lagged dimensionality reduction methods.

* 10 pages, 14 figures

Via

Access Paper or Ask Questions

On the Effectiveness of Hybrid Mutual Information Estimation

Jun 02, 2023

Marco Federici, David Ruhe, Patrick Forré

Abstract:Estimating the mutual information from samples from a joint distribution is a challenging problem in both science and engineering. In this work, we realize a variational bound that generalizes both discriminative and generative approaches. Using this bound, we propose a hybrid method to mitigate their respective shortcomings. Further, we propose Predictive Quantization (PQ): a simple generative method that can be easily combined with discriminative estimators for minimal computational overhead. Our propositions yield a tighter bound on the information thanks to the reduced variance of the estimator. We test our methods on a challenging task of correlated high-dimensional Gaussian distributions and a stochastic process involving a system of free particles subjected to a fixed energy landscape. Empirical results show that hybrid methods consistently improved mutual information estimates when compared to the corresponding discriminative counterpart.

Via

Access Paper or Ask Questions

Two for One: Diffusion Models and Force Fields for Coarse-Grained Molecular Dynamics

Feb 01, 2023

Marloes Arts, Victor Garcia Satorras, Chin-Wei Huang, Daniel Zuegner, Marco Federici, Cecilia Clementi, Frank Noé, Robert Pinsler, Rianne van den Berg

Abstract:Coarse-grained (CG) molecular dynamics enables the study of biological processes at temporal and spatial scales that would be intractable at an atomistic resolution. However, accurately learning a CG force field remains a challenge. In this work, we leverage connections between score-based generative models, force fields and molecular dynamics to learn a CG force field without requiring any force inputs during training. Specifically, we train a diffusion generative model on protein structures from molecular dynamics simulations, and we show that its score function approximates a force field that can directly be used to simulate CG molecular dynamics. While having a vastly simplified training setup compared to previous work, we demonstrate that our approach leads to improved performance across several small- to medium-sized protein simulations, reproducing the CG equilibrium distribution, and preserving dynamics of all-atom simulations such as protein folding events.

Via

Access Paper or Ask Questions

Compositional Mixture Representations for Vision and Text

Jun 13, 2022

Stephan Alaniz, Marco Federici, Zeynep Akata

Figure 1 for Compositional Mixture Representations for Vision and Text

Figure 2 for Compositional Mixture Representations for Vision and Text

Figure 3 for Compositional Mixture Representations for Vision and Text

Figure 4 for Compositional Mixture Representations for Vision and Text

Abstract:Learning a common representation space between vision and language allows deep networks to relate objects in the image to the corresponding semantic meaning. We present a model that learns a shared Gaussian mixture representation imposing the compositionality of the text onto the visual domain without having explicit location supervision. By combining the spatial transformer with a representation learning approach we learn to split images into separately encoded patches to associate visual and textual representations in an interpretable manner. On variations of MNIST and CIFAR10, our model is able to perform weakly supervised object detection and demonstrates its ability to extrapolate to unseen combination of objects.

* Workshop on Learning with Limited Labelled Data for Image and Video Understanding (L3D-IVU), CVPR 2022

Via

Access Paper or Ask Questions

Towards Lightweight Controllable Audio Synthesis with Conditional Implicit Neural Representations

Dec 02, 2021

Jan Zuiderveld, Marco Federici, Erik J. Bekkers

Figure 1 for Towards Lightweight Controllable Audio Synthesis with Conditional Implicit Neural Representations

Figure 2 for Towards Lightweight Controllable Audio Synthesis with Conditional Implicit Neural Representations

Figure 3 for Towards Lightweight Controllable Audio Synthesis with Conditional Implicit Neural Representations

Figure 4 for Towards Lightweight Controllable Audio Synthesis with Conditional Implicit Neural Representations

Abstract:The high temporal resolution of audio and our perceptual sensitivity to small irregularities in waveforms make synthesizing at high sampling rates a complex and computationally intensive task, prohibiting real-time, controllable synthesis within many approaches. In this work we aim to shed light on the potential of Conditional Implicit Neural Representations (CINRs) as lightweight backbones in generative frameworks for audio synthesis. Our experiments show that small Periodic Conditional INRs (PCINRs) learn faster and generally produce quantitatively better audio reconstructions than Transposed Convolutional Neural Networks with equal parameter counts. However, their performance is very sensitive to activation scaling hyperparameters. When learning to represent more uniform sets, PCINRs tend to introduce artificial high-frequency components in reconstructions. We validate this noise can be minimized by applying standard weight regularization during training or decreasing the compositional depth of PCINRs, and suggest directions for future research.

* Accepted to "Deep Generative Models and Downstream Applications" (Oral) and "Machine Learning for Creativity and Design" (Poster) workshops at NeurIPS 2021

Via

Access Paper or Ask Questions

A Bayesian Approach to Invariant Deep Neural Networks

Jul 20, 2021

Nikolaos Mourdoukoutas, Marco Federici, Georges Pantalos, Mark van der Wilk, Vincent Fortuin

Figure 1 for A Bayesian Approach to Invariant Deep Neural Networks

Figure 2 for A Bayesian Approach to Invariant Deep Neural Networks

Figure 3 for A Bayesian Approach to Invariant Deep Neural Networks

Figure 4 for A Bayesian Approach to Invariant Deep Neural Networks

Abstract:We propose a novel Bayesian neural network architecture that can learn invariances from data alone by inferring a posterior distribution over different weight-sharing schemes. We show that our model outperforms other non-invariant architectures, when trained on datasets that contain specific invariances. The same holds true when no data augmentation is performed.

* 8 pages, 3 figures, To be published in ICML UDL 2021

Via

Access Paper or Ask Questions