Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Thomas Strohmer

University of California, Davis

FedOSAA: Improving Federated Learning with One-Step Anderson Acceleration

Mar 14, 2025

Xue Feng, M. Paul Laiu, Thomas Strohmer

Figure 1 for FedOSAA: Improving Federated Learning with One-Step Anderson Acceleration

Figure 2 for FedOSAA: Improving Federated Learning with One-Step Anderson Acceleration

Figure 3 for FedOSAA: Improving Federated Learning with One-Step Anderson Acceleration

Figure 4 for FedOSAA: Improving Federated Learning with One-Step Anderson Acceleration

Abstract:Federated learning (FL) is a distributed machine learning approach that enables multiple local clients and a central server to collaboratively train a model while keeping the data on their own devices. First-order methods, particularly those incorporating variance reduction techniques, are the most widely used FL algorithms due to their simple implementation and stable performance. However, these methods tend to be slow and require a large number of communication rounds to reach the global minimizer. We propose FedOSAA, a novel approach that preserves the simplicity of first-order methods while achieving the rapid convergence typically associated with second-order methods. Our approach applies one Anderson acceleration (AA) step following classical local updates based on first-order methods with variance reduction, such as FedSVRG and SCAFFOLD, during local training. This AA step is able to leverage curvature information from the history points and gives a new update that approximates the Newton-GMRES direction, thereby significantly improving the convergence. We establish a local linear convergence rate to the global minimizer of FedOSAA for smooth and strongly convex loss functions. Numerical comparisons show that FedOSAA substantially improves the communication and computation efficiency of the original first-order methods, achieving performance comparable to second-order methods like GIANT.

Via

Access Paper or Ask Questions

Machine Unlearning via Information Theoretic Regularization

Feb 08, 2025

Shizhou Xu, Thomas Strohmer

Abstract:How can we effectively remove or "unlearn" undesirable information, such as specific features or individual data points, from a learning outcome while minimizing utility loss and ensuring rigorous guarantees? We introduce a mathematical framework based on information-theoretic regularization to address both feature and data point unlearning. For feature unlearning, we derive a unified solution that simultaneously optimizes diverse learning objectives, including entropy, conditional entropy, KL-divergence, and the energy of conditional probability. For data point unlearning, we first propose a novel definition that serves as a practical condition for unlearning via retraining, is easy to verify, and aligns with the principles of differential privacy from an inference perspective. Then, we provide provable guarantees for our framework on data point unlearning. By combining flexibility in learning objectives with simplicity in regularization design, our approach is highly adaptable and practical for a wide range of machine learning and AI applications.

* 31 pages, 2 figures

Via

Access Paper or Ask Questions

WHOMP: Optimizing Randomized Controlled Trials via Wasserstein Homogeneity

Sep 27, 2024

Shizhou Xu, Thomas Strohmer

Figure 1 for WHOMP: Optimizing Randomized Controlled Trials via Wasserstein Homogeneity

Figure 2 for WHOMP: Optimizing Randomized Controlled Trials via Wasserstein Homogeneity

Figure 3 for WHOMP: Optimizing Randomized Controlled Trials via Wasserstein Homogeneity

Figure 4 for WHOMP: Optimizing Randomized Controlled Trials via Wasserstein Homogeneity

Abstract:We investigate methods for partitioning datasets into subgroups that maximize diversity within each subgroup while minimizing dissimilarity across subgroups. We introduce a novel partitioning method called the $\textit{Wasserstein Homogeneity Partition}$ (WHOMP), which optimally minimizes type I and type II errors that often result from imbalanced group splitting or partitioning, commonly referred to as accidental bias, in comparative and controlled trials. We conduct an analytical comparison of WHOMP against existing partitioning methods, such as random subsampling, covariate-adaptive randomization, rerandomization, and anti-clustering, demonstrating its advantages. Moreover, we characterize the optimal solutions to the WHOMP problem and reveal an inherent trade-off between the stability of subgroup means and variances among these solutions. Based on our theoretical insights, we design algorithms that not only obtain these optimal solutions but also equip practitioners with tools to select the desired trade-off. Finally, we validate the effectiveness of WHOMP through numerical experiments, highlighting its superiority over traditional methods.

* 46 pages, 3 figures

Via

Access Paper or Ask Questions

Differentially Private Synthetic High-dimensional Tabular Stream

Aug 31, 2024

Girish Kumar, Thomas Strohmer, Roman Vershynin

Figure 1 for Differentially Private Synthetic High-dimensional Tabular Stream

Figure 2 for Differentially Private Synthetic High-dimensional Tabular Stream

Figure 3 for Differentially Private Synthetic High-dimensional Tabular Stream

Figure 4 for Differentially Private Synthetic High-dimensional Tabular Stream

Abstract:While differentially private synthetic data generation has been explored extensively in the literature, how to update this data in the future if the underlying private data changes is much less understood. We propose an algorithmic framework for streaming data that generates multiple synthetic datasets over time, tracking changes in the underlying private data. Our algorithm satisfies differential privacy for the entire input stream (continual differential privacy) and can be used for high-dimensional tabular data. Furthermore, we show the utility of our method via experiments on real-world datasets. The proposed algorithm builds upon a popular select, measure, fit, and iterate paradigm (used by offline synthetic data generation algorithms) and private counters for streams.

Via

Access Paper or Ask Questions

An Algorithm for Streaming Differentially Private Data

Jan 31, 2024

Girish Kumar, Thomas Strohmer, Roman Vershynin

Abstract:Much of the research in differential privacy has focused on offline applications with the assumption that all data is available at once. When these algorithms are applied in practice to streams where data is collected over time, this either violates the privacy guarantees or results in poor utility. We derive an algorithm for differentially private synthetic streaming data generation, especially curated towards spatial datasets. Furthermore, we provide a general framework for online selective counting among a collection of queries which forms a basis for many tasks such as query answering and synthetic data generation. The utility of our algorithm is verified on both real-world and simulated datasets.

Via

Access Paper or Ask Questions

On the (In)Compatibility between Group Fairness and Individual Fairness

Jan 13, 2024

Shizhou Xu, Thomas Strohmer

Abstract:We study the compatibility between the optimal statistical parity solutions and individual fairness. While individual fairness seeks to treat similar individuals similarly, optimal statistical parity aims to provide similar treatment to individuals who share relative similarity within their respective sensitive groups. The two fairness perspectives, while both desirable from a fairness perspective, often come into conflict in applications. Our goal in this work is to analyze the existence of this conflict and its potential solution. In particular, we establish sufficient (sharp) conditions for the compatibility between the optimal (post-processing) statistical parity $L^2$ learning and the ($K$-Lipschitz or $(\epsilon,\delta)$) individual fairness requirements. Furthermore, when there exists a conflict between the two, we first relax the former to the Pareto frontier (or equivalently the optimal trade-off) between $L^2$ error and statistical disparity, and then analyze the compatibility between the frontier and the individual fairness requirements. Our analysis identifies regions along the Pareto frontier that satisfy individual fairness requirements. (Lastly, we provide individual fairness guarantees for the composition of a trained model and the optimal post-processing step so that one can determine the compatibility of the post-processed model.) This provides practitioners with a valuable approach to attain Pareto optimality for statistical parity while adhering to the constraints of individual fairness.

* 32 pages, 3 figures

Via

Access Paper or Ask Questions

Differentially private low-dimensional representation of high-dimensional data

May 26, 2023

Yiyun He, Thomas Strohmer, Roman Vershynin, Yizhe Zhu

Abstract:Differentially private synthetic data provide a powerful mechanism to enable data analysis while protecting sensitive information about individuals. However, when the data lie in a high-dimensional space, the accuracy of the synthetic data suffers from the curse of dimensionality. In this paper, we propose a differentially private algorithm to generate low-dimensional synthetic data efficiently from a high-dimensional dataset with a utility guarantee with respect to the Wasserstein distance. A key step of our algorithm is a private principal component analysis (PCA) procedure with a near-optimal accuracy bound that circumvents the curse of dimensionality. Different from the standard perturbation analysis using the Davis-Kahan theorem, our analysis of private PCA works without assuming the spectral gap for the sample covariance matrix.

* 21 pages

Via

Access Paper or Ask Questions

Semi-Supervised Clustering of Sparse Graphs: Crossing the Information-Theoretic Threshold

May 24, 2022

Junda Sheng, Thomas Strohmer

Figure 1 for Semi-Supervised Clustering of Sparse Graphs: Crossing the Information-Theoretic Threshold

Figure 2 for Semi-Supervised Clustering of Sparse Graphs: Crossing the Information-Theoretic Threshold

Figure 3 for Semi-Supervised Clustering of Sparse Graphs: Crossing the Information-Theoretic Threshold

Figure 4 for Semi-Supervised Clustering of Sparse Graphs: Crossing the Information-Theoretic Threshold

Abstract:The stochastic block model is a canonical random graph model for clustering and community detection on network-structured data. Decades of extensive study on the problem have established many profound results, among which the phase transition at the Kesten-Stigum threshold is particularly interesting both from a mathematical and an applied standpoint. It states that no estimator based on the network topology can perform substantially better than chance on sparse graphs if the model parameter is below certain threshold. Nevertheless, if we slightly extend the horizon to the ubiquitous semi-supervised setting, such a fundamental limitation will disappear completely. We prove that with arbitrary fraction of the labels revealed, the detection problem is feasible throughout the parameter domain. Moreover, we introduce two efficient algorithms, one combinatorial and one based on optimization, to integrate label information with graph structures. Our work brings a new perspective to stochastic model of networks and semidefinite program research.

* 40 pages, 8 figures

Via

Access Paper or Ask Questions

Fair Data Representation for Machine Learning at the Pareto Frontier

Jan 02, 2022

Shizhou Xu, Thomas Strohmer

Figure 1 for Fair Data Representation for Machine Learning at the Pareto Frontier

Figure 2 for Fair Data Representation for Machine Learning at the Pareto Frontier

Figure 3 for Fair Data Representation for Machine Learning at the Pareto Frontier

Figure 4 for Fair Data Representation for Machine Learning at the Pareto Frontier

Abstract:As machine learning powered decision making is playing an increasingly important role in our daily lives, it is imperative to strive for fairness of the underlying data processing and algorithms. We propose a pre-processing algorithm for fair data representation via which L2- objective supervised learning algorithms result in an estimation of the Pareto frontier between prediction error and statistical disparity. In particular, the present work applies the optimal positive definite affine transport maps to approach the post-processing Wasserstein barycenter characterization of the optimal fair L2-objective supervised learning via a pre-processing data deformation. We call the resulting data Wasserstein pseudo-barycenter. Furthermore, we show that the Wasserstein geodesics from the learning outcome marginals to the barycenter characterizes the Pareto frontier between L2-loss and total Wasserstein distance among learning outcome marginals. Thereby, an application of McCann interpolation generalizes the pseudo-barycenter to a family of data representations via which L2-objective supervised learning algorithms result in the Pareto frontier. Numerical simulations underscore the advantages of the proposed data representation: (1) the pre-processing step is compositive with arbitrary L2-objective supervised learning methods and unseen data; (2) the fair representation protects data privacy by preventing the training machine from direct or indirect access to the sensitive information of the data; (3) the optimal affine map results in efficient computation of fair supervised learning on high-dimensional data; (4) experimental results shed light on the fairness of L2-objective unsupervised learning via the proposed fair data representation.

* 40 pages, 5 figures

Via

Access Paper or Ask Questions

A Performance Guarantee for Spectral Clustering

Jul 10, 2020

March Boedihardjo, Shaofeng Deng, Thomas Strohmer

Figure 1 for A Performance Guarantee for Spectral Clustering

Figure 2 for A Performance Guarantee for Spectral Clustering

Abstract:The two-step spectral clustering method, which consists of the Laplacian eigenmap and a rounding step, is a widely used method for graph partitioning. It can be seen as a natural relaxation to the NP-hard minimum ratio cut problem. In this paper we study the central question: when is spectral clustering able to find the global solution to the minimum ratio cut problem? First we provide a condition that naturally depends on the intra- and inter-cluster connectivities of a given partition under which we may certify that this partition is the solution to the minimum ratio cut problem. Then we develop a deterministic two-to-infinity norm perturbation bound for the the invariant subspace of the graph Laplacian that corresponds to the $k$ smallest eigenvalues. Finally by combining these two results we give a condition under which spectral clustering is guaranteed to output the global solution to the minimum ratio cut problem, which serves as a performance guarantee for spectral clustering.

Via

Access Paper or Ask Questions