Abstract:Machine learning models are vulnerable to biases that result in unfair treatment of individuals from different populations. Recent work that aims to test a model's fairness at the individual level either relies on domain knowledge to choose metrics, or on input transformations that risk generating out-of-domain samples. We describe a new approach for testing individual fairness that does not have either requirement. We propose a novel criterion for evaluating individual fairness and develop a practical testing method based on this criterion which we call fAux (pronounced fox). This is based on comparing the derivatives of the predictions of the model to be tested with those of an auxiliary model, which predicts the protected variable from the observed data. We show that the proposed method effectively identifies discrimination on both synthetic and real-world datasets, and has quantitative and qualitative advantages over contemporary methods.
Abstract:We explore training Binary Neural Networks (BNNs) as a discrete variable inference problem over a factor graph. We study the behaviour of this conversion in an under-parameterized BNN setting and propose stochastic versions of Belief Propagation (BP) and Survey Propagation (SP) message passing algorithms to overcome the intractability of their current formulation. Compared to traditional gradient methods for BNNs, our results indicate that both stochastic BP and SP find better configurations of the parameters in the BNN.
Abstract:Adversarial robustness is one of the essential safety criteria for guaranteeing the reliability of machine learning models. While various adversarial robustness testing approaches were introduced in the last decade, we note that most of them are incompatible with non-differentiable models such as tree ensembles. Since tree ensembles are widely used in industry, this reveals a crucial gap between adversarial robustness research and practical applications. This paper proposes a novel whitebox adversarial robustness testing approach for tree ensemble models. Concretely, the proposed approach smooths the tree ensembles through temperature controlled sigmoid functions, which enables gradient descent-based adversarial attacks. By leveraging sampling and the log-derivative trick, the proposed approach can scale up to testing tasks that were previously unmanageable. We compare the approach against both random perturbations and blackbox approaches on multiple public datasets (and corresponding models). Our results show that the proposed method can 1) successfully reveal the adversarial vulnerability of tree ensemble models without causing computational pressure for testing and 2) flexibly balance the search performance and time complexity to meet various testing criteria.
Abstract:The problem of efficiently training and evaluating image classifiers that can distinguish between a large number of object categories is considered. A novel metric, sharpness, is proposed which is defined as the fraction of object categories that are above a threshold accuracy. To estimate sharpness (along with a confidence value), a technique called fraction-accurate estimation is introduced which samples categories and samples instances from these categories. In addition, a technique called parity partition coding, a special type of error correcting output code, is introduced, increasing sharpness, while reducing the multi-class problem to a multi-label one with exponentially fewer outputs. We demonstrate that this approach outperforms the baseline model for both MultiMNIST and CelebA, while requiring fewer parameters and exceeding state of the art accuracy on individual labels.