Abstract:Phase retrieval is the classical problem of recovering a signal $x^* \in \mathbb{R}^n$ from its noisy phaseless measurements $y_i = \langle a_i, x^* \rangle^2 + ζ_i$ (where $ζ_i$ denotes noise, and $a_i$ is the sensing vector) for $i \in [m]$. The problem of phase retrieval has a rich history, with a variety of applications such as optics, crystallography, heteroscedastic regression, astrophysics, etc. A major consideration in algorithms for phase retrieval is robustness against measurement errors. In recent breakthroughs in algorithmic robust statistics, efficient algorithms have been developed for several parameter estimation tasks such as mean estimation, covariance estimation, robust principal component analysis (PCA), etc. in the presence of heavy-tailed noise and adversarial corruptions. In this paper, we study efficient algorithms for robust phase retrieval with heavy-tailed noise when a constant fraction of both the measurements $y_i$ and the sensing vectors $a_i$ may be arbitrarily adversarially corrupted. For this problem, Buna and Rebeschini (AISTATS 2025) very recently gave an exponential time algorithm with sample complexity $O(n \log n)$. Their algorithm needs a robust spectral initialization, specifically, a robust estimate of the top eigenvector of a covariance matrix, which they deemed to be beyond known efficient algorithmic techniques (similar spectral initializations are a key ingredient of a large family of phase retrieval algorithms). In this work, we make a connection between robust spectral initialization and recent algorithmic advances in robust PCA, yielding the first polynomial-time algorithms for robust phase retrieval with both heavy-tailed noise and adversarial corruptions, in fact with near-linear (in $n$) sample complexity.
Abstract:In this article, we present a novel redundancy scheme to realize a fault-tolerant IoT structure for application in high-reliability systems. The proposed fault-tolerant structure uses a centralized data fusion block and triplicated IoT devices, along with software-based "digital twins", that duplicate the function of each of the sensors. In case of a fault in one of the IoT devices, the pertinent digital twin takes over the function of the actual IoT device for some time in the triplicated structure till the faulty device is either replaced or repaired when possible. The use of software-based digital twins as a duplicate for each physical sensor improves the reliability of the operation with minimal increase in the overall system cost.
Abstract:The growing adoption of IoT systems in industries like transportation, banking, healthcare, and smart energy has increased reliance on sensor networks. However, anomalies in sensor readings can undermine system reliability, making real-time anomaly detection essential. While a large body of research addresses anomaly detection in IoT networks, few studies focus on correlated sensor data streams, such as temperature and pressure within a shared space, especially in resource-constrained environments. To address this, we propose a novel hybrid machine learning approach combining Principal Component Analysis (PCA) and Autoencoders. In this method, PCA continuously monitors sensor data and triggers the Autoencoder when significant variations are detected. This hybrid approach, validated with real-world and simulated data, shows faster response times and fewer false positives. The F1 score of the hybrid method is comparable to Autoencoder, with much faster response time which is driven by PCA.
Abstract:In contemporary deep learning practice, models are often trained to near zero loss i.e. to nearly interpolate the training data. However, the number of parameters in the model is usually far more than the number of data points $n$, the theoretical minimum needed for interpolation: a phenomenon referred to as overparameterization. In an interesting piece of work that contributes to the considerable research that has been devoted to understand overparameterization, Bubeck, and Sellke showed that for a broad class of covariate distributions (specifically those satisfying a natural notion of concentration of measure), overparameterization is necessary for robust interpolation i.e. if the interpolating function is required to be Lipschitz. However, their robustness results were proved only in the setting of regression with square loss. In practice, however many other kinds of losses are used, e.g. cross entropy loss for classification. In this work, we generalize Bubeck and Selke's result to Bregman divergence losses, which form a common generalization of square loss and cross-entropy loss. Our generalization relies on identifying a bias variance-type decomposition that lies at the heart of the proof and Bubeck and Sellke.