Abstract:Hidden Markov models (HMMs) are characterized by an unobservable (hidden) Markov chain and an observable process, which is a noisy version of the hidden chain. Decoding the original signal (i.e., hidden chain) from the noisy observations is one of the main goals in nearly all HMM based data analyses. Existing decoding algorithms such as the Viterbi algorithm have computational complexity at best linear in the length of the observed sequence, and sub-quadratic in the size of the state space of the Markov chain. We present Quick Adaptive Ternary Segmentation (QATS), a divide-and-conquer procedure which decodes the hidden sequence in polylogarithmic computational complexity in the length of the sequence, and cubic in the size of the state space, hence particularly suited for large scale HMMs with relatively few states. The procedure also suggests an effective way of data storage as specific cumulative sums. In essence, the estimated sequence of states sequentially maximizes local likelihood scores among all local paths with at most three segments. The maximization is performed only approximately using an adaptive search procedure. The resulting sequence is admissible in the sense that all transitions occur with positive probability. To complement formal results justifying our approach, we present Monte-Carlo simulations which demonstrate the speedups provided by QATS in comparison to Viterbi, along with a precision analysis of the returned sequences. An implementation of QATS in C++ is provided in the R-package QATS and is available from GitHub.
Abstract:As a classical and ever reviving topic, change point detection is often formulated as a search for the maximum of a gain function describing improved fits when segmenting the data. Searching through all candidate split points on the grid for finding the best one requires $O(T)$ evaluations of the gain function for an interval with $T$ observations. If each evaluation is computationally demanding (e.g. in high-dimensional models), this can become infeasible. Instead, we propose optimistic search strategies with $O(\log T)$ evaluations exploiting specific structure of the gain function. Towards solid understanding of our strategies, we investigate in detail the classical univariate Gaussian change in mean setup. For some of our proposals we prove asymptotic minimax optimality for single and multiple change point scenarios. Our search strategies generalize far beyond the theoretically analyzed univariate setup. We illustrate, as an example, massive computational speedup in change point detection for high-dimensional Gaussian graphical models. More generally, we demonstrate empirically that our optimistic search methods lead to competitive estimation performance while heavily reducing run-time.
Abstract:In this paper we present a spatially-adaptive method for image reconstruction that is based on the concept of statistical multiresolution estimation as introduced in [Frick K, Marnitz P, and Munk A. "Statistical multiresolution Dantzig estimation in imaging: Fundamental concepts and algorithmic framework". Electron. J. Stat., 6:231-268, 2012]. It constitutes a variational regularization technique that uses an supremum-type distance measure as data-fidelity combined with a convex cost functional. The resulting convex optimization problem is approached by a combination of an inexact alternating direction method of multipliers and Dykstra's projection algorithm. We describe a novel method for balancing data-fit and regularity that is fully automatic and allows for a sound statistical interpretation. The performance of our estimation approach is studied for various problems in imaging. Among others, this includes deconvolution problems that arise in Poisson nanoscale fluorescence microscopy.
Abstract:In this paper we are concerned with fully automatic and locally adaptive estimation of functions in a "signal + noise"-model where the regression function may additionally be blurred by a linear operator, e.g. by a convolution. To this end, we introduce a general class of statistical multiresolution estimators and develop an algorithmic framework for computing those. By this we mean estimators that are defined as solutions of convex optimization problems with supremum-type constraints. We employ a combination of the alternating direction method of multipliers with Dykstra's algorithm for computing orthogonal projections onto intersections of convex sets and prove numerical convergence. The capability of the proposed method is illustrated by various examples from imaging and signal detection.
Abstract:We study the effect of growth on the fingerprints of adolescents, based on which we suggest a simple method to adjust for growth when trying to recover a juvenile's fingerprint in a database years later. Based on longitudinal data sets in juveniles' criminal records, we show that growth essentially leads to an isotropic rescaling, so that we can use the strong correlation between growth in stature and limbs to model the growth of fingerprints proportional to stature growth as documented in growth charts. The proposed rescaling leads to a 72% reduction of the distances between corresponding minutiae for the data set analyzed. These findings were corroborated by several verification tests. In an identification test on a database containing 3.25 million right index fingers at the Federal Criminal Police Office of Germany, the identification error rate of 20.8% was reduced to 2.1% by rescaling. The presented method is of striking simplicity and can easily be integrated into existing automated fingerprint identification systems.
Abstract:Let $(X,Y)$ be a random variable consisting of an observed feature vector $X\in \mathcal{X}$ and an unobserved class label $Y\in \{1,2,...,L\}$ with unknown joint distribution. In addition, let $\mathcal{D}$ be a training data set consisting of $n$ completely observed independent copies of $(X,Y)$. Usual classification procedures provide point predictors (classifiers) $\widehat{Y}(X,\mathcal{D})$ of $Y$ or estimate the conditional distribution of $Y$ given $X$. In order to quantify the certainty of classifying $X$ we propose to construct for each $\theta =1,2,...,L$ a p-value $\pi_{\theta}(X,\mathcal{D})$ for the null hypothesis that $Y=\theta$, treating $Y$ temporarily as a fixed parameter. In other words, the point predictor $\widehat{Y}(X,\mathcal{D})$ is replaced with a prediction region for $Y$ with a certain confidence. We argue that (i) this approach is advantageous over traditional approaches and (ii) any reasonable classifier can be modified to yield nonparametric p-values. We discuss issues such as optimality, single use and multiple use validity, as well as computational and graphical aspects.