Abstract:We introduce Hades, an unsupervised algorithm to detect singularities in data. This algorithm employs a kernel goodness-of-fit test, and as a consequence it is much faster and far more scaleable than the existing topology-based alternatives. Using tools from differential geometry and optimal transport theory, we prove that Hades correctly detects singularities with high probability when the data sample lives on a transverse intersection of equidimensional manifolds. In computational experiments, Hades recovers singularities in synthetically generated data, branching points in road network data, intersection rings in molecular conformation space, and anomalies in image data.
Abstract:We provide explicit bounds on the number of sample points required to estimate tangent spaces and intrinsic dimensions of (smooth, compact) Euclidean submanifolds via local principal component analysis. Our approach directly estimates covariance matrices locally, which simultaneously allows estimating both the tangent spaces and the intrinsic dimension of a manifold. The key arguments involve a matrix concentration inequality, a Wasserstein bound for flattening a manifold, and a Lipschitz relation for the covariance matrix with respect to the Wasserstein distance.