Abstract:Self-supervised learning for time-series data holds potential similar to that recently unleashed in Natural Language Processing and Computer Vision. While most existing works in this area focus on contrastive learning, we propose a conceptually simple yet powerful non-contrastive approach, based on the data2vec self-distillation framework. The core of our method is a student-teacher scheme that predicts the latent representation of an input time series from masked views of the same time series. This strategy avoids strong modality-specific assumptions and biases typically introduced by the design of contrastive sample pairs. We demonstrate the competitiveness of our approach for classification and forecasting as downstream tasks, comparing with state-of-the-art self-supervised learning methods on the UCR and UEA archives as well as the ETT and Electricity datasets.
Abstract:Generalized Additive Models (GAMs) have recently experienced a resurgence in popularity due to their interpretability, which arises from expressing the target value as a sum of non-linear transformations of the features. Despite the current enthusiasm for GAMs, their susceptibility to concurvity - i.e., (possibly non-linear) dependencies between the features - has hitherto been largely overlooked. Here, we demonstrate how concurvity can severly impair the interpretability of GAMs and propose a remedy: a conceptually simple, yet effective regularizer which penalizes pairwise correlations of the non-linearly transformed feature variables. This procedure is applicable to any differentiable additive model, such as Neural Additive Models or NeuralProphet, and enhances interpretability by eliminating ambiguities due to self-canceling feature contributions. We validate the effectiveness of our regularizer in experiments on synthetic as well as real-world datasets for time-series and tabular data. Our experiments show that concurvity in GAMs can be reduced without significantly compromising prediction quality, improving interpretability and reducing variance in the feature importances.
Abstract:State-of-the-art semantic segmentation models are characterized by high parameter counts and slow inference times, making them unsuitable for deployment in resource-constrained environments. To address this challenge, we propose \textsc{Auto-Compressing Subset Pruning}, \acosp, as a new online compression method. The core of \acosp consists of learning a channel selection mechanism for individual channels of each convolution in the segmentation model based on an effective temperature annealing schedule. We show a crucial interplay between providing a high-capacity model at the beginning of training and the compression pressure forcing the model to compress concepts into retained channels. We apply \acosp to \segnet and \pspnet architectures and show its success when trained on the \camvid, \city, \voc, and \ade datasets. The results are competitive with existing baselines for compression of segmentation models at low compression ratios and outperform them significantly at high compression ratios, yielding acceptable results even when removing more than $93\%$ of the parameters. In addition, \acosp is conceptually simple, easy to implement, and can readily be generalized to other data modalities, tasks, and architectures. Our code is available at \url{https://github.com/merantix/acosp}.