Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Max Zimmer

Computational Algebra with Attention: Transformer Oracles for Border Basis Algorithms

May 29, 2025

Hiroshi Kera, Nico Pelleriti, Yuki Ishihara, Max Zimmer, Sebastian Pokutta

Abstract:Solving systems of polynomial equations, particularly those with finitely many solutions, is a crucial challenge across many scientific fields. Traditional methods like Gr\"obner and Border bases are fundamental but suffer from high computational costs, which have motivated recent Deep Learning approaches to improve efficiency, albeit at the expense of output correctness. In this work, we introduce the Oracle Border Basis Algorithm, the first Deep Learning approach that accelerates Border basis computation while maintaining output guarantees. To this end, we design and train a Transformer-based oracle that identifies and eliminates computationally expensive reduction steps, which we find to dominate the algorithm's runtime. By selectively invoking this oracle during critical phases of computation, we achieve substantial speedup factors of up to 3.5x compared to the base algorithm, without compromising the correctness of results. To generate the training data, we develop a sampling method and provide the first sampling theorem for border bases. We construct a tokenization and embedding scheme tailored to monomial-centered algebraic computations, resulting in a compact and expressive input representation, which reduces the number of tokens to encode an $n$-variate polynomial by a factor of $O(n)$. Our learning approach is data efficient, stable, and a practical enhancement to traditional computer algebra algorithms and symbolic computation.

* 13+19 pages (3+9 figures, 2+7 tables)

Via

Access Paper or Ask Questions

RECON: Robust symmetry discovery via Explicit Canonical Orientation Normalization

May 19, 2025

Alonso Urbano, David W. Romero, Max Zimmer, Sebastian Pokutta

Abstract:Real-world data often exhibits unknown or approximate symmetries, yet existing equivariant networks must commit to a fixed transformation group prior to training, e.g., continuous $SO(2)$ rotations. This mismatch degrades performance when the actual data symmetries differ from those in the transformation group. We introduce RECON, a framework to discover each input's intrinsic symmetry distribution from unlabeled data. RECON leverages class-pose decompositions and applies a data-driven normalization to align arbitrary reference frames into a common natural pose, yielding directly comparable and interpretable symmetry descriptors. We demonstrate effective symmetry discovery on 2D image benchmarks and -- for the first time -- extend it to 3D transformation groups, paving the way towards more flexible equivariant modeling.

Via

Access Paper or Ask Questions

DUNIA: Pixel-Sized Embeddings via Cross-Modal Alignment for Earth Observation Applications

Feb 24, 2025

Ibrahim Fayad, Max Zimmer, Martin Schwartz, Philippe Ciais, Fabian Gieseke, Gabriel Belouze, Sarah Brood, Aurelien De Truchis, Alexandre d'Aspremont

Abstract:Significant efforts have been directed towards adapting self-supervised multimodal learning for Earth observation applications. However, existing methods produce coarse patch-sized embeddings, limiting their effectiveness and integration with other modalities like LiDAR. To close this gap, we present DUNIA, an approach to learn pixel-sized embeddings through cross-modal alignment between images and full-waveform LiDAR data. As the model is trained in a contrastive manner, the embeddings can be directly leveraged in the context of a variety of environmental monitoring tasks in a zero-shot setting. In our experiments, we demonstrate the effectiveness of the embeddings for seven such tasks (canopy height mapping, fractional canopy cover, land cover mapping, tree species identification, plant area index, crop type classification, and per-pixel waveform-based vertical structure mapping). The results show that the embeddings, along with zero-shot classifiers, often outperform specialized supervised models, even in low data regimes. In the fine-tuning setting, we show strong low-shot capabilities with performances near or better than state-of-the-art on five out of six tasks.

* 26 pages, 8 figures

Via

Access Paper or Ask Questions

Approximating Latent Manifolds in Neural Networks via Vanishing Ideals

Feb 20, 2025

Nico Pelleriti, Max Zimmer, Elias Wirth, Sebastian Pokutta

Abstract:Deep neural networks have reshaped modern machine learning by learning powerful latent representations that often align with the manifold hypothesis: high-dimensional data lie on lower-dimensional manifolds. In this paper, we establish a connection between manifold learning and computational algebra by demonstrating how vanishing ideals can characterize the latent manifolds of deep networks. To that end, we propose a new neural architecture that (i) truncates a pretrained network at an intermediate layer, (ii) approximates each class manifold via polynomial generators of the vanishing ideal, and (iii) transforms the resulting latent space into linearly separable features through a single polynomial layer. The resulting models have significantly fewer layers than their pretrained baselines, while maintaining comparable accuracy, achieving higher throughput, and utilizing fewer parameters. Furthermore, drawing on spectral complexity analysis, we derive sharper theoretical guarantees for generalization, showing that our approach can in principle offer tighter bounds than standard deep networks. Numerical experiments confirm the effectiveness and efficiency of the proposed approach.

* 26 pages (8 main body, rest appendix and references), 12 figures, 3 tables, 3 algorithms

Via

Access Paper or Ask Questions

Capturing Temporal Dynamics in Large-Scale Canopy Tree Height Estimation

Jan 31, 2025

Jan Pauls, Max Zimmer, Berkant Turan, Sassan Saatchi, Philippe Ciais, Sebastian Pokutta, Fabian Gieseke

Figure 1 for Capturing Temporal Dynamics in Large-Scale Canopy Tree Height Estimation

Figure 2 for Capturing Temporal Dynamics in Large-Scale Canopy Tree Height Estimation

Figure 3 for Capturing Temporal Dynamics in Large-Scale Canopy Tree Height Estimation

Figure 4 for Capturing Temporal Dynamics in Large-Scale Canopy Tree Height Estimation

Abstract:With the rise in global greenhouse gas emissions, accurate large-scale tree canopy height maps are essential for understanding forest structure, estimating above-ground biomass, and monitoring ecological disruptions. To this end, we present a novel approach to generate large-scale, high-resolution canopy height maps over time. Our model accurately predicts canopy height over multiple years given Sentinel-2 time series satellite data. Using GEDI LiDAR data as the ground truth for training the model, we present the first 10m resolution temporal canopy height map of the European continent for the period 2019-2022. As part of this product, we also offer a detailed canopy height map for 2020, providing more precise estimates than previous studies. Our pipeline and the resulting temporal height map are publicly available, enabling comprehensive large-scale monitoring of forests and, hence, facilitating future research and ecological analyses. For an interactive viewer, see https://europetreemap.projects.earthengine.app/view/temporalcanopyheight.

* 9 pages main paper, 5 pages references and appendix, 8 figures, 5 tables

Via

Access Paper or Ask Questions

Neural Discovery in Mathematics: Do Machines Dream of Colored Planes?

Jan 30, 2025

Konrad Mundinger, Max Zimmer, Aldo Kiem, Christoph Spiegel, Sebastian Pokutta

Figure 1 for Neural Discovery in Mathematics: Do Machines Dream of Colored Planes?

Figure 2 for Neural Discovery in Mathematics: Do Machines Dream of Colored Planes?

Figure 3 for Neural Discovery in Mathematics: Do Machines Dream of Colored Planes?

Figure 4 for Neural Discovery in Mathematics: Do Machines Dream of Colored Planes?

Abstract:We demonstrate how neural networks can drive mathematical discovery through a case study of the Hadwiger-Nelson problem, a long-standing open problem from discrete geometry and combinatorics about coloring the plane avoiding monochromatic unit-distance pairs. Using neural networks as approximators, we reformulate this mixed discrete-continuous geometric coloring problem as an optimization task with a probabilistic, differentiable loss function. This enables gradient-based exploration of admissible configurations that most significantly led to the discovery of two novel six-colorings, providing the first improvements in thirty years to the off-diagonal variant of the original problem (Mundinger et al., 2024a). Here, we establish the underlying machine learning approach used to obtain these results and demonstrate its broader applicability through additional results and numerical insights.

* 8 pages main paper, 10 pages references and appendix, 17 figures, 1 table

Via

Access Paper or Ask Questions

Estimating Canopy Height at Scale

Jun 03, 2024

Jan Pauls, Max Zimmer, Una M. Kelly, Martin Schwartz, Sassan Saatchi, Philippe Ciais, Sebastian Pokutta, Martin Brandt, Fabian Gieseke

Figure 1 for Estimating Canopy Height at Scale

Figure 2 for Estimating Canopy Height at Scale

Figure 3 for Estimating Canopy Height at Scale

Figure 4 for Estimating Canopy Height at Scale

Abstract:We propose a framework for global-scale canopy height estimation based on satellite data. Our model leverages advanced data preprocessing techniques, resorts to a novel loss function designed to counter geolocation inaccuracies inherent in the ground-truth height measurements, and employs data from the Shuttle Radar Topography Mission to effectively filter out erroneous labels in mountainous regions, enhancing the reliability of our predictions in those areas. A comparison between predictions and ground-truth labels yields an MAE / RMSE of 2.43 / 4.73 (meters) overall and 4.45 / 6.72 (meters) for trees taller than five meters, which depicts a substantial improvement compared to existing global-scale maps. The resulting height map as well as the underlying framework will facilitate and enhance ecological analyses at a global scale, including, but not limited to, large-scale forest and biomass monitoring.

* ICML Camera-Ready, 17 pages, 14 figures, 7 tables

Via

Access Paper or Ask Questions

Neural Parameter Regression for Explicit Representations of PDE Solution Operators

Mar 19, 2024

Konrad Mundinger, Max Zimmer, Sebastian Pokutta

Abstract:We introduce Neural Parameter Regression (NPR), a novel framework specifically developed for learning solution operators in Partial Differential Equations (PDEs). Tailored for operator learning, this approach surpasses traditional DeepONets (Lu et al., 2021) by employing Physics-Informed Neural Network (PINN, Raissi et al., 2019) techniques to regress Neural Network (NN) parameters. By parametrizing each solution based on specific initial conditions, it effectively approximates a mapping between function spaces. Our method enhances parameter efficiency by incorporating low-rank matrices, thereby boosting computational efficiency and scalability. The framework shows remarkable adaptability to new initial and boundary conditions, allowing for rapid fine-tuning and inference, even in cases of out-of-distribution examples.

* ICLR24 Workshop AI4Differential Equations In Science, 15 pages, 4 figures, 2 tables, 1 algorithm

Via

Access Paper or Ask Questions

On the Byzantine-Resilience of Distillation-Based Federated Learning

Feb 19, 2024

Christophe Roux, Max Zimmer, Sebastian Pokutta

Figure 1 for On the Byzantine-Resilience of Distillation-Based Federated Learning

Figure 2 for On the Byzantine-Resilience of Distillation-Based Federated Learning

Figure 3 for On the Byzantine-Resilience of Distillation-Based Federated Learning

Figure 4 for On the Byzantine-Resilience of Distillation-Based Federated Learning

Abstract:Federated Learning (FL) algorithms using Knowledge Distillation (KD) have received increasing attention due to their favorable properties with respect to privacy, non-i.i.d. data and communication cost. These methods depart from transmitting model parameters and, instead, communicate information about a learning task by sharing predictions on a public dataset. In this work, we study the performance of such approaches in the byzantine setting, where a subset of the clients act in an adversarial manner aiming to disrupt the learning process. We show that KD-based FL algorithms are remarkably resilient and analyze how byzantine clients can influence the learning process compared to Federated Averaging. Based on these insights, we introduce two new byzantine attacks and demonstrate that they are effective against prior byzantine-resilient methods. Additionally, we propose FilterExp, a novel method designed to enhance the byzantine resilience of KD-based FL algorithms and demonstrate its efficacy. Finally, we provide a general method to make attacks harder to detect, improving their effectiveness.

Via

Access Paper or Ask Questions

PERP: Rethinking the Prune-Retrain Paradigm in the Era of LLMs

Dec 23, 2023

Max Zimmer, Megi Andoni, Christoph Spiegel, Sebastian Pokutta

Abstract:Neural Networks can be efficiently compressed through pruning, significantly reducing storage and computational demands while maintaining predictive performance. Simple yet effective methods like Iterative Magnitude Pruning (IMP, Han et al., 2015) remove less important parameters and require a costly retraining procedure to recover performance after pruning. However, with the rise of Large Language Models (LLMs), full retraining has become infeasible due to memory and compute constraints. In this study, we challenge the practice of retraining all parameters by demonstrating that updating only a small subset of highly expressive parameters is often sufficient to recover or even improve performance compared to full retraining. Surprisingly, retraining as little as 0.27%-0.35% of the parameters of GPT-architectures (OPT-2.7B/6.7B/13B/30B) achieves comparable performance to One Shot IMP across various sparsity levels. Our method, Parameter-Efficient Retraining after Pruning (PERP), drastically reduces compute and memory demands, enabling pruning and retraining of up to 30 billion parameter models on a single NVIDIA A100 GPU within minutes. Despite magnitude pruning being considered as unsuited for pruning LLMs, our findings show that PERP positions it as a strong contender against state-of-the-art retraining-free approaches such as Wanda (Sun et al., 2023) and SparseGPT (Frantar & Alistarh, 2023), opening up a promising alternative to avoiding retraining.

* 15 pages, 3 figures,

Via

Access Paper or Ask Questions