Abstract:With growing applications of Machine Learning (ML) techniques in the real world, it is highly important to ensure that these models work in an equitable manner. One main step in ensuring fairness is to effectively measure fairness, and to this end, various metrics have been proposed in the past literature. While the computation of those metrics are straightforward in the classification set-up, it is computationally intractable in the regression domain. To address the challenge of computational intractability, past literature proposed various methods to approximate such metrics. However, they did not verify the extent to which the output of such approximation algorithms are consistent with each other. To fill this gap, this paper comprehensively studies the consistency of the output of various fairness measurement methods through conducting an extensive set of experiments on various regression tasks. As a result, it finds that while some fairness measurement approaches show strong consistency across various regression tasks, certain methods show a relatively poor consistency in certain regression tasks. This, in turn, calls for a more principled approach for measuring fairness in the regression domain.