Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Daniel Klotz

TiRex: Zero-Shot Forecasting Across Long and Short Horizons with Enhanced In-Context Learning

May 29, 2025

Andreas Auer, Patrick Podest, Daniel Klotz, Sebastian Böck, Günter Klambauer, Sepp Hochreiter

Abstract:In-context learning, the ability of large language models to perform tasks using only examples provided in the prompt, has recently been adapted for time series forecasting. This paradigm enables zero-shot prediction, where past values serve as context for forecasting future values, making powerful forecasting tools accessible to non-experts and increasing the performance when training data are scarce. Most existing zero-shot forecasting approaches rely on transformer architectures, which, despite their success in language, often fall short of expectations in time series forecasting, where recurrent models like LSTMs frequently have the edge. Conversely, while LSTMs are well-suited for time series modeling due to their state-tracking capabilities, they lack strong in-context learning abilities. We introduce TiRex that closes this gap by leveraging xLSTM, an enhanced LSTM with competitive in-context learning skills. Unlike transformers, state-space models, or parallelizable RNNs such as RWKV, TiRex retains state-tracking, a critical property for long-horizon forecasting. To further facilitate its state-tracking ability, we propose a training-time masking strategy called CPM. TiRex sets a new state of the art in zero-shot time series forecasting on the HuggingFace benchmarks GiftEval and Chronos-ZS, outperforming significantly larger models including TabPFN-TS (Prior Labs), Chronos Bolt (Amazon), TimesFM (Google), and Moirai (Salesforce) across both short- and long-term forecasts.

Via

Access Paper or Ask Questions

Simplified priors for Object-Centric Learning

Oct 01, 2024

Vihang Patil, Andreas Radler, Daniel Klotz, Sepp Hochreiter

Abstract:Humans excel at abstracting data and constructing \emph{reusable} concepts, a capability lacking in current continual learning systems. The field of object-centric learning addresses this by developing abstract representations, or slots, from data without human supervision. Different methods have been proposed to tackle this task for images, whereas most are overly complex, non-differentiable, or poorly scalable. In this paper, we introduce a conceptually simple, fully-differentiable, non-iterative, and scalable method called SAMP Simplified Slot Attention with Max Pool Priors). It is implementable using only Convolution and MaxPool layers and an Attention layer. Our method encodes the input image with a Convolutional Neural Network and then uses a branch of alternating Convolution and MaxPool layers to create specialized sub-networks and extract primitive slots. These primitive slots are then used as queries for a Simplified Slot Attention over the encoded image. Despite its simplicity, our method is competitive or outperforms previous methods on standard benchmarks.

Via

Access Paper or Ask Questions

Energy-based Hopfield Boosting for Out-of-Distribution Detection

May 14, 2024

Claus Hofmann, Simon Schmid, Bernhard Lehner, Daniel Klotz, Sepp Hochreiter

Abstract:Out-of-distribution (OOD) detection is critical when deploying machine learning models in the real world. Outlier exposure methods, which incorporate auxiliary outlier data in the training process, can drastically improve OOD detection performance compared to approaches without advanced training strategies. We introduce Hopfield Boosting, a boosting approach, which leverages modern Hopfield energy (MHE) to sharpen the decision boundary between the in-distribution and OOD data. Hopfield Boosting encourages the model to concentrate on hard-to-distinguish auxiliary outlier examples that lie close to the decision boundary between in-distribution and auxiliary outlier data. Our method achieves a new state-of-the-art in OOD detection with outlier exposure, improving the FPR95 metric from 2.28 to 0.92 on CIFAR-10 and from 11.76 to 7.94 on CIFAR-100.

Via

Access Paper or Ask Questions

Conformal Prediction for Time Series with Modern Hopfield Networks

Mar 22, 2023

Andreas Auer, Martin Gauch, Daniel Klotz, Sepp Hochreiter

Abstract:To quantify uncertainty, conformal prediction methods are gaining continuously more interest and have already been successfully applied to various domains. However, they are difficult to apply to time series as the autocorrelative structure of time series violates basic assumptions required by conformal prediction. We propose HopCPT, a novel conformal prediction approach for time series that not only copes with temporal structures but leverages them. We show that our approach is theoretically well justified for time series where temporal dependencies are present. In experiments, we demonstrate that our new approach outperforms state-of-the-art conformal prediction methods on multiple real-world time series datasets from four different domains.

* under Review

Via

Access Paper or Ask Questions

Few-Shot Learning by Dimensionality Reduction in Gradient Space

Jun 07, 2022

Martin Gauch, Maximilian Beck, Thomas Adler, Dmytro Kotsur, Stefan Fiel, Hamid Eghbal-zadeh, Johannes Brandstetter, Johannes Kofler, Markus Holzleitner, Werner Zellinger(+3 more)

Figure 1 for Few-Shot Learning by Dimensionality Reduction in Gradient Space

Figure 2 for Few-Shot Learning by Dimensionality Reduction in Gradient Space

Figure 3 for Few-Shot Learning by Dimensionality Reduction in Gradient Space

Figure 4 for Few-Shot Learning by Dimensionality Reduction in Gradient Space

Abstract:We introduce SubGD, a novel few-shot learning method which is based on the recent finding that stochastic gradient descent updates tend to live in a low-dimensional parameter subspace. In experimental and theoretical analyses, we show that models confined to a suitable predefined subspace generalize well for few-shot learning. A suitable subspace fulfills three criteria across the given tasks: it (a) allows to reduce the training error by gradient flow, (b) leads to models that generalize well, and (c) can be identified by stochastic gradient descent. SubGD identifies these subspaces from an eigendecomposition of the auto-correlation matrix of update directions across different tasks. Demonstrably, we can identify low-dimensional suitable subspaces for few-shot learning of dynamical systems, which have varying properties described by one or few parameters of the analytical system description. Such systems are ubiquitous among real-world applications in science and engineering. We experimentally corroborate the advantages of SubGD on three distinct dynamical systems problem settings, significantly outperforming popular few-shot learning methods both in terms of sample efficiency and performance.

* Accepted at Conference on Lifelong Learning Agents (CoLLAs) 2022. Code: https://github.com/ml-jku/subgd Blog post: https://ml-jku.github.io/subgd

Via

Access Paper or Ask Questions

MC-LSTM: Mass-Conserving LSTM

Feb 08, 2021

Pieter-Jan Hoedt, Frederik Kratzert, Daniel Klotz, Christina Halmich, Markus Holzleitner, Grey Nearing, Sepp Hochreiter, Günter Klambauer

Figure 1 for MC-LSTM: Mass-Conserving LSTM

Figure 2 for MC-LSTM: Mass-Conserving LSTM

Figure 3 for MC-LSTM: Mass-Conserving LSTM

Figure 4 for MC-LSTM: Mass-Conserving LSTM

Abstract:The success of Convolutional Neural Networks (CNNs) in computer vision is mainly driven by their strong inductive bias, which is strong enough to allow CNNs to solve vision-related tasks with random weights, meaning without learning. Similarly, Long Short-Term Memory (LSTM) has a strong inductive bias towards storing information over time. However, many real-world systems are governed by conservation laws, which lead to the redistribution of particular quantities -- e.g. in physical and economical systems. Our novel Mass-Conserving LSTM (MC-LSTM) adheres to these conservation laws by extending the inductive bias of LSTM to model the redistribution of those stored quantities. MC-LSTMs set a new state-of-the-art for neural arithmetic units at learning arithmetic operations, such as addition tasks, which have a strong conservation law, as the sum is constant over time. Further, MC-LSTM is applied to traffic forecasting, modelling a pendulum, and a large benchmark dataset in hydrology, where it sets a new state-of-the-art for predicting peak flows. In the hydrology example, we show that MC-LSTM states correlate with real-world processes and are therefore interpretable.

* 12 pages (8 without references) + 16 pages appendix

Via

Access Paper or Ask Questions

Uncertainty Estimation with Deep Learning for Rainfall-Runoff Modelling

Dec 15, 2020

Daniel Klotz, Frederik Kratzert, Martin Gauch, Alden Keefe Sampson, Günter Klambauer, Sepp Hochreiter, Grey Nearing

Figure 1 for Uncertainty Estimation with Deep Learning for Rainfall-Runoff Modelling

Figure 2 for Uncertainty Estimation with Deep Learning for Rainfall-Runoff Modelling

Figure 3 for Uncertainty Estimation with Deep Learning for Rainfall-Runoff Modelling

Figure 4 for Uncertainty Estimation with Deep Learning for Rainfall-Runoff Modelling

Abstract:Deep Learning is becoming an increasingly important way to produce accurate hydrological predictions across a wide range of spatial and temporal scales. Uncertainty estimations are critical for actionable hydrological forecasting, and while standardized community benchmarks are becoming an increasingly important part of hydrological model development and research, similar tools for benchmarking uncertainty estimation are lacking. We establish an uncertainty estimation benchmarking procedure and present four Deep Learning baselines, out of which three are based on Mixture Density Networks and one is based on Monte Carlo dropout. Additionally, we provide a post-hoc model analysis to put forward some qualitative understanding of the resulting models. Most importantly however, we show that accurate, precise, and reliable uncertainty estimation can be achieved with Deep Learning.

* 32 pages, 11 figures

Via

Access Paper or Ask Questions

Rainfall-Runoff Prediction at Multiple Timescales with a Single Long Short-Term Memory Network

Oct 15, 2020

Martin Gauch, Frederik Kratzert, Daniel Klotz, Grey Nearing, Jimmy Lin, Sepp Hochreiter

Figure 1 for Rainfall-Runoff Prediction at Multiple Timescales with a Single Long Short-Term Memory Network

Figure 2 for Rainfall-Runoff Prediction at Multiple Timescales with a Single Long Short-Term Memory Network

Figure 3 for Rainfall-Runoff Prediction at Multiple Timescales with a Single Long Short-Term Memory Network

Figure 4 for Rainfall-Runoff Prediction at Multiple Timescales with a Single Long Short-Term Memory Network

Abstract:Long Short-Term Memory Networks (LSTMs) have been applied to daily discharge prediction with remarkable success. Many practical scenarios, however, require predictions at more granular timescales. For instance, accurate prediction of short but extreme flood peaks can make a life-saving difference, yet such peaks may escape the coarse temporal resolution of daily predictions. Naively training an LSTM on hourly data, however, entails very long input sequences that make learning hard and computationally expensive. In this study, we propose two Multi-Timescale LSTM (MTS-LSTM) architectures that jointly predict multiple timescales within one model, as they process long-past inputs at a single temporal resolution and branch out into each individual timescale for more recent input steps. We test these models on 516 basins across the continental United States and benchmark against the US National Water Model. Compared to naive prediction with a distinct LSTM per timescale, the multi-timescale architectures are computationally more efficient with no loss in accuracy. Beyond prediction quality, the multi-timescale LSTM can process different input variables at different timescales, which is especially relevant to operational applications where the lead time of meteorological forcings depends on their temporal resolution.

Via

Access Paper or Ask Questions

Using LSTMs for climate change assessment studies on droughts and floods

Nov 28, 2019

Frederik Kratzert, Daniel Klotz, Johannes Brandstetter, Pieter-Jan Hoedt, Grey Nearing, Sepp Hochreiter

Figure 1 for Using LSTMs for climate change assessment studies on droughts and floods

Abstract:Climate change affects occurrences of floods and droughts worldwide. However, predicting climate impacts over individual watersheds is difficult, primarily because accurate hydrological forecasts require models that are calibrated to past data. In this work we present a large-scale LSTM-based modeling approach that -- by training on large data sets -- learns a diversity of hydrological behaviors. Previous work shows that this model is more accurate than current state-of-the-art models, even when the LSTM-based approach operates out-of-sample and the latter in-sample. In this work, we show how this model can assess the sensitivity of the underlying systems with regard to extreme (high and low) flows in individual watersheds over the continental US.

Via

Access Paper or Ask Questions

Accurate Hydrologic Modeling Using Less Information

Nov 21, 2019

Guy Shalev, Ran El-Yaniv, Daniel Klotz, Frederik Kratzert, Asher Metzger, Sella Nevo

Figure 1 for Accurate Hydrologic Modeling Using Less Information

Abstract:Joint models are a common and important tool in the intersection of machine learning and the physical sciences, particularly in contexts where real-world measurements are scarce. Recent developments in rainfall-runoff modeling, one of the prime challenges in hydrology, show the value of a joint model with shared representation in this important context. However, current state-of-the-art models depend on detailed and reliable attributes characterizing each site to help the model differentiate correctly between the behavior of different sites. This dependency can present a challenge in data-poor regions. In this paper, we show that we can replace the need for such location-specific attributes with a completely data-driven learned embedding, and match previous state-of-the-art results with less information.

Via

Access Paper or Ask Questions