Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jakub Marecek

Sample Complexity of Bias Detection with Subsampled Point-to-Subspace Distances

Feb 04, 2025

German Martinez Matilla, Jakub Marecek

Abstract:Sample complexity of bias estimation is a lower bound on the runtime of any bias detection method. Many regulatory frameworks require the bias to be tested for all subgroups, whose number grows exponentially with the number of protected attributes. Unless one wishes to run a bias detection with a doubly-exponential run-time, one should like to have polynomial complexity of bias detection for a single subgroup. At the same time, the reference data may be based on surveys, and thus come with non-trivial uncertainty. Here, we reformulate bias detection as a point-to-subspace problem on the space of measures and show that, for supremum norm, it can be subsampled efficiently. In particular, our probabilistically approximately correct (PAC) results are corroborated by tests on well-known instances.

Via

Access Paper or Ask Questions

Empirical Bayes for Dynamic Bayesian Networks Using Generalized Variational Inference

Jun 25, 2024

Vyacheslav Kungurtsev, Apaar Garg, Aarya Khandelwal, Parth Sandeep Ratogi, Bapi Chatterjee, Jakub Marecek

Abstract:In this work, we demonstrate the Empirical Bayes approach to learning a Dynamic Bayesian Network. By starting with several point estimates of structure and weights, we can use a data-driven prior to subsequently obtain a model to quantify uncertainty. This approach uses a recent development of Generalized Variational Inference, and indicates the potential of sampling the uncertainty of a mixture of DAG structures as well as a parameter posterior.

Via

Access Paper or Ask Questions

Fairness in Ranking: Robustness through Randomization without the Protected Attribute

Mar 28, 2024

Andrii Kliachkin, Eleni Psaroudaki, Jakub Marecek, Dimitris Fotakis

Abstract:There has been great interest in fairness in machine learning, especially in relation to classification problems. In ranking-related problems, such as in online advertising, recommender systems, and HR automation, much work on fairness remains to be done. Two complications arise: first, the protected attribute may not be available in many applications. Second, there are multiple measures of fairness of rankings, and optimization-based methods utilizing a single measure of fairness of rankings may produce rankings that are unfair with respect to other measures. In this work, we propose a randomized method for post-processing rankings, which do not require the availability of the protected attribute. In an extensive numerical study, we show the robustness of our methods with respect to P-Fairness and effectiveness with respect to Normalized Discounted Cumulative Gain (NDCG) from the baseline ranking, improving on previously proposed methods.

Via

Access Paper or Ask Questions

Learning quantum Hamiltonians at any temperature in polynomial time with Chebyshev and bit complexity

Feb 08, 2024

Ales Wodecki, Jakub Marecek

Abstract:We consider the problem of learning local quantum Hamiltonians given copies of their Gibbs state at a known inverse temperature, following Haah et al. [2108.04842] and Bakshi et al. [arXiv:2310.02243]. Our main technical contribution is a new flat polynomial approximation of the exponential function based on the Chebyshev expansion, which enables the formulation of learning quantum Hamiltonians as a polynomial optimization problem. This, in turn, can benefit from the use of moment/SOS relaxations, whose polynomial bit complexity requires careful analysis [O'Donnell, ITCS 2017]. Finally, we show that learning a $k$-local Hamiltonian, whose dual interaction graph is of bounded degree, runs in polynomial time under mild assumptions.

* 16 pages

Via

Access Paper or Ask Questions

Generating Likely Counterfactuals Using Sum-Product Networks

Jan 25, 2024

Jiri Nemecek, Tomas Pevny, Jakub Marecek

Figure 1 for Generating Likely Counterfactuals Using Sum-Product Networks

Figure 2 for Generating Likely Counterfactuals Using Sum-Product Networks

Figure 3 for Generating Likely Counterfactuals Using Sum-Product Networks

Figure 4 for Generating Likely Counterfactuals Using Sum-Product Networks

Abstract:Due to user demand and recent regulation (GDPR, AI Act), decisions made by AI systems need to be explained. These decisions are often explainable only post hoc, where counterfactual explanations are popular. The question of what constitutes the best counterfactual explanation must consider multiple aspects, where "distance from the sample" is the most common. We argue that this requirement frequently leads to explanations that are unlikely and, therefore, of limited value. Here, we present a system that provides high-likelihood explanations. We show that the search for the most likely explanations satisfying many common desiderata for counterfactual explanations can be modeled using mixed-integer optimization (MIO). In the process, we propose an MIO formulation of a Sum-Product Network (SPN) and use the SPN to estimate the likelihood of a counterfactual, which can be of independent interest. A numerical comparison against several methods for generating counterfactual explanations is provided.

Via

Access Paper or Ask Questions

Joint Problems in Learning Multiple Dynamical Systems

Nov 03, 2023

Mengjia Niu, Xiaoyu He, Petr Rysavy, Quan Zhou, Jakub Marecek

Abstract:Clustering of time series is a well-studied problem, with applications ranging from quantitative, personalized models of metabolism obtained from metabolite concentrations to state discrimination in quantum information theory. We consider a variant, where given a set of trajectories and a number of parts, we jointly partition the set of trajectories and learn linear dynamical system (LDS) models for each part, so as to minimize the maximum error across all the models. We present globally convergent methods and EM heuristics, accompanied by promising computational results.

Via

Access Paper or Ask Questions

Group-blind optimal transport to group parity and its constrained variants

Oct 17, 2023

Quan Zhou, Jakub Marecek

Abstract:Fairness holds a pivotal role in the realm of machine learning, particularly when it comes to addressing groups categorised by sensitive attributes, e.g., gender, race. Prevailing algorithms in fair learning predominantly hinge on accessibility or estimations of these sensitive attributes, at least in the training process. We design a single group-blind projection map that aligns the feature distributions of both groups in the source data, achieving (demographic) group parity, without requiring values of the protected attribute for individual samples in the computation of the map, as well as its use. Instead, our approach utilises the feature distributions of the privileged and unprivileged groups in a boarder population and the essential assumption that the source data are unbiased representation of the population. We present numerical results on synthetic data and real data.

Via

Access Paper or Ask Questions

Taming Binarized Neural Networks and Mixed-Integer Programs

Oct 05, 2023

Johannes Aspman, Georgios Korpas, Jakub Marecek

Abstract:There has been a great deal of recent interest in binarized neural networks, especially because of their explainability. At the same time, automatic differentiation algorithms such as backpropagation fail for binarized neural networks, which limits their applicability. By reformulating the problem of training binarized neural networks as a subadditive dual of a mixed-integer program, we show that binarized neural networks admit a tame representation. This, in turn, makes it possible to use the framework of Bolte et al. for implicit differentiation, which offers the possibility for practical implementation of backpropagation in the context of binarized neural networks. This approach could also be used for a broader class of mixed-integer programs, beyond the training of binarized neural networks, as encountered in symbolic approaches to AI and beyond.

* 19 pages, 4 figures

Via

Access Paper or Ask Questions

Improving the Validity of Decision Trees as Explanations

Jun 13, 2023

Jiri Nemecek, Tomas Pevny, Jakub Marecek

Figure 1 for Improving the Validity of Decision Trees as Explanations

Figure 2 for Improving the Validity of Decision Trees as Explanations

Figure 3 for Improving the Validity of Decision Trees as Explanations

Figure 4 for Improving the Validity of Decision Trees as Explanations

Abstract:In classification and forecasting with tabular data, one often utilizes tree-based models. This can be competitive with deep neural networks on tabular data [cf. Grinsztajn et al., NeurIPS 2022, arXiv:2207.08815] and, under some conditions, explainable. The explainability depends on the depth of the tree and the accuracy in each leaf of the tree. Here, we train a low-depth tree with the objective of minimising the maximum misclassification error across each leaf node, and then ``suspend'' further tree-based models (e.g., trees of unlimited depth) from each leaf of the low-depth tree. The low-depth tree is easily explainable, while the overall statistical performance of the combined low-depth and suspended tree-based models improves upon decision trees of unlimited depth trained using classical methods (e.g., CART) and is comparable to state-of-the-art methods (e.g., well-tuned XGBoost).

Via

Access Paper or Ask Questions

Fairness in Forecasting of Observations of Linear Dynamical Systems

Sep 16, 2022

Quan Zhou, Jakub Marecek, Robert N. Shorten

Figure 1 for Fairness in Forecasting of Observations of Linear Dynamical Systems

Figure 2 for Fairness in Forecasting of Observations of Linear Dynamical Systems

Figure 3 for Fairness in Forecasting of Observations of Linear Dynamical Systems

Figure 4 for Fairness in Forecasting of Observations of Linear Dynamical Systems

Abstract:In machine learning, training data often capture the behaviour of multiple subgroups of some underlying human population. When the nature of training data for subgroups are not controlled carefully, under-representation bias arises. To counter this effect we introduce two natural notions of subgroup fairness and instantaneous fairness to address such under-representation bias in time-series forecasting problems. Here we show globally convergent methods for the fairness-constrained learning problems using hierarchies of convexifications of non-commutative polynomial optimisation problems. Our empirical results on a biased data set motivated by insurance applications and the well-known COMPAS data set demonstrate the efficacy of our methods. We also show that by exploiting sparsity in the convexifications, we can reduce the run time of our methods considerably.

* Journal version of Zhou et al. [arXiv:2006.07315, AAAI 2021]

Via

Access Paper or Ask Questions