Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mona Schirmer

Test-Time Adaptation with State-Space Models

Jul 17, 2024

Mona Schirmer, Dan Zhang, Eric Nalisnick

Figure 1 for Test-Time Adaptation with State-Space Models

Figure 2 for Test-Time Adaptation with State-Space Models

Figure 3 for Test-Time Adaptation with State-Space Models

Figure 4 for Test-Time Adaptation with State-Space Models

Abstract:Distribution shifts between training and test data are all but inevitable over the lifecycle of a deployed model and lead to performance decay. Adapting the model can hopefully mitigate this drop in performance. Yet, adaptation is challenging since it must be unsupervised: we usually do not have access to any labeled data at test time. In this paper, we propose a probabilistic state-space model that can adapt a deployed model subjected to distribution drift. Our model learns the dynamics induced by distribution shifts on the last set of hidden features. Without requiring labels, we infer time-evolving class prototypes that serve as a dynamic classification head. Moreover, our approach is lightweight, modifying only the model's last linear layer. In experiments on real-world distribution shifts and synthetic corruptions, we demonstrate that our approach performs competitively with methods that require back-propagation and access to the model backbone. Our model especially excels in the case of small test batches - the most difficult setting.

Via

Access Paper or Ask Questions

Beyond Top-Class Agreement: Using Divergences to Forecast Performance under Distribution Shift

Dec 13, 2023

Mona Schirmer, Dan Zhang, Eric Nalisnick

Abstract:Knowing if a model will generalize to data 'in the wild' is crucial for safe deployment. To this end, we study model disagreement notions that consider the full predictive distribution - specifically disagreement based on Hellinger distance, Jensen-Shannon and Kullback-Leibler divergence. We find that divergence-based scores provide better test error estimates and detection rates on out-of-distribution data compared to their top-1 counterparts. Experiments involve standard vision and foundation models.

* Workshop on Distribution Shifts, 37th Conference on Neural Information Processing Systems (NeurIPS 2023)

Via

Access Paper or Ask Questions

Modeling Irregular Time Series with Continuous Recurrent Units

Nov 22, 2021

Mona Schirmer, Mazin Eltayeb, Stefan Lessmann, Maja Rudolph

Figure 1 for Modeling Irregular Time Series with Continuous Recurrent Units

Figure 2 for Modeling Irregular Time Series with Continuous Recurrent Units

Figure 3 for Modeling Irregular Time Series with Continuous Recurrent Units

Figure 4 for Modeling Irregular Time Series with Continuous Recurrent Units

Abstract:Recurrent neural networks (RNNs) like long short-term memory networks (LSTMs) and gated recurrent units (GRUs) are a popular choice for modeling sequential data. Their gating mechanism permits weighting previous history encoded in a hidden state with new information from incoming observations. In many applications, such as medical records, observations times are irregular and carry important information. However, LSTMs and GRUs assume constant time intervals between observations. To address this challenge, we propose continuous recurrent units (CRUs) -a neural architecture that can naturally handle irregular time intervals between observations. The gating mechanism of the CRU employs the continuous formulation of a Kalman filter and alternates between (1) continuous latent state propagation according to a linear stochastic differential equation (SDE) and (2) latent state updates whenever a new observation comes in. In an empirical study, we show that the CRU can better interpolate irregular time series than neural ordinary differential equation (neural ODE)-based models. We also show that our model can infer dynamics from im-ages and that the Kalman gain efficiently singles out candidates for valuable state updates from noisy observations.

Via

Access Paper or Ask Questions