Abstract:We propose to learn invariant representations, in the data domain, to achieve interpretability in algorithmic fairness. Invariance implies a selectivity for high level, relevant correlations w.r.t. class label annotations, and a robustness to irrelevant correlations with protected characteristics such as race or gender. We introduce a non-trivial setup in which the training set exhibits a strong bias such that class label annotations are irrelevant and spurious correlations cannot be distinguished. To address this problem, we introduce an adversarially trained model with a null-sampling procedure to produce invariant representations in the data domain. To enable disentanglement, a partially-labelled representative set is used. By placing the representations into the data domain, the changes made by the model are easily examinable by human auditors. We show the effectiveness of our method on both image and tabular datasets: Coloured MNIST, the CelebA and the Adult dataset.
Abstract:We observe a rapid increase in machine learning models for learning data representations that remove the semantics of protected characteristics, and are therefore able to mitigate unfair prediction outcomes. This is indeed a positive proliferation. All available models however learn latent embeddings, therefore the produced representations do not have the semantic meaning of the input. Our aim here is to learn fair representations that are directly interpretable in the original input domain. We cast this problem as a data-to-data translation; to learn a mapping from data in a source domain to a target domain such that data in the target domain enforces fairness definitions, such as statistical parity or equality of opportunity. Unavailability of fair data in the target domain is the crux of the problem. This paper provides the first approach to learn a highly unconstrained mapping from source to target by maximizing (conditional) dependence of residuals - the difference between data and its translated version - and protected characteristics. The usage of residual statistics ensures that our generated fair data should only be an adjustment of the input data, and this adjustment should reveal the main difference between protected characteristic groups. When applied to CelebA face image dataset with gender as protected characteristic, our model enforces equality of opportunity by adjusting eyes and lips regions. In Adult income dataset, also with gender as protected characteristic, our model achieves equality of opportunity by, among others, obfuscating wife and husband relationship. Visualizing those systematic changes will allow us to scrutinize the interplay of fairness criterion, chosen protected characteristics, and the prediction performance.