Abstract:AI fairness seeks to improve the transparency and explainability of AI systems by ensuring that their outcomes genuinely reflect the best interests of users. Data augmentation, which involves generating synthetic data from existing datasets, has gained significant attention as a solution to data scarcity. In particular, diffusion models have become a powerful technique for generating synthetic data, especially in fields like computer vision. This paper explores the potential of diffusion models to generate synthetic tabular data to improve AI fairness. The Tabular Denoising Diffusion Probabilistic Model (Tab-DDPM), a diffusion model adaptable to any tabular dataset and capable of handling various feature types, was utilized with different amounts of generated data for data augmentation. Additionally, reweighting samples from AIF360 was employed to further enhance AI fairness. Five traditional machine learning models-Decision Tree (DT), Gaussian Naive Bayes (GNB), K-Nearest Neighbors (KNN), Logistic Regression (LR), and Random Forest (RF)-were used to validate the proposed approach. Experimental results demonstrate that the synthetic data generated by Tab-DDPM improves fairness in binary classification.
Abstract:Fairness AI aims to detect and alleviate bias across the entire AI development life cycle, encompassing data curation, modeling, evaluation, and deployment-a pivotal aspect of ethical AI implementation. Addressing data bias, particularly concerning sensitive attributes like gender and race, reweighting samples proves efficient for fairness AI. This paper contributes a systematic examination of reweighting samples for traditional machine learning (ML) models, employing five models for binary classification on the Adult Income and COMPUS datasets with various protected attributes. The study evaluates prediction results using five fairness metrics, uncovering the nuanced and model-specific nature of reweighting sample effectiveness in achieving fairness in traditional ML models, as well as revealing the complexity of bias dynamics.