Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gemma Catolino

Data Preparation for Fairness-Performance Trade-Offs: A Practitioner-Friendly Alternative?

Dec 20, 2024

Gianmario Voria, Rebecca Di Matteo, Giammaria Giordano, Gemma Catolino, Fabio Palomba

Abstract:As machine learning (ML) systems are increasingly adopted across industries, addressing fairness and bias has become essential. While many solutions focus on ethical challenges in ML, recent studies highlight that data itself is a major source of bias. Pre-processing techniques, which mitigate bias before training, are effective but may impact model performance and pose integration difficulties. In contrast, fairness-aware Data Preparation practices are both familiar to practitioners and easier to implement, providing a more accessible approach to reducing bias. Objective. This registered report proposes an empirical evaluation of how optimally selected fairness-aware practices, applied in early ML lifecycle stages, can enhance both fairness and performance, potentially outperforming standard pre-processing bias mitigation methods. Method. To this end, we will introduce FATE, an optimization technique for selecting 'Data Preparation' pipelines that optimize fairness and performance. Using FATE, we will analyze the fairness-performance trade-off, comparing pipelines selected by FATE with results by pre-processing bias mitigation techniques.

* Accepted as Registered Report at SANER'25

Via

Access Paper or Ask Questions

From Expectation to Habit: Why Do Software Practitioners Adopt Fairness Toolkits?

Dec 19, 2024

Gianmario Voria, Stefano Lambiase, Maria Concetta Schiavone, Gemma Catolino, Fabio Palomba

Figure 1 for From Expectation to Habit: Why Do Software Practitioners Adopt Fairness Toolkits?

Figure 2 for From Expectation to Habit: Why Do Software Practitioners Adopt Fairness Toolkits?

Figure 3 for From Expectation to Habit: Why Do Software Practitioners Adopt Fairness Toolkits?

Figure 4 for From Expectation to Habit: Why Do Software Practitioners Adopt Fairness Toolkits?

Abstract:As the adoption of machine learning (ML) systems continues to grow across industries, concerns about fairness and bias in these systems have taken center stage. Fairness toolkits, designed to mitigate bias in ML models, serve as critical tools for addressing these ethical concerns. However, their adoption in the context of software development remains underexplored, especially regarding the cognitive and behavioral factors driving their usage. As a deeper understanding of these factors could be pivotal in refining tool designs and promoting broader adoption, this study investigates the factors influencing the adoption of fairness toolkits from an individual perspective. Guided by the Unified Theory of Acceptance and Use of Technology (UTAUT2), we examined the factors shaping the intention to adopt and actual use of fairness toolkits. Specifically, we employed Partial Least Squares Structural Equation Modeling (PLS-SEM) to analyze data from a survey study involving practitioners in the software industry. Our findings reveal that performance expectancy and habit are the primary drivers of fairness toolkit adoption. These insights suggest that by emphasizing the effectiveness of these tools in mitigating bias and fostering habitual use, organizations can encourage wider adoption. Practical recommendations include improving toolkit usability, integrating bias mitigation processes into routine development workflows, and providing ongoing support to ensure professionals see clear benefits from regular use.

Via

Access Paper or Ask Questions

A Catalog of Fairness-Aware Practices in Machine Learning Engineering

Aug 29, 2024

Gianmario Voria, Giulia Sellitto, Carmine Ferrara, Francesco Abate, Andrea De Lucia, Filomena Ferrucci, Gemma Catolino, Fabio Palomba

Figure 1 for A Catalog of Fairness-Aware Practices in Machine Learning Engineering

Figure 2 for A Catalog of Fairness-Aware Practices in Machine Learning Engineering

Figure 3 for A Catalog of Fairness-Aware Practices in Machine Learning Engineering

Figure 4 for A Catalog of Fairness-Aware Practices in Machine Learning Engineering

Abstract:Machine learning's widespread adoption in decision-making processes raises concerns about fairness, particularly regarding the treatment of sensitive features and potential discrimination against minorities. The software engineering community has responded by developing fairness-oriented metrics, empirical studies, and approaches. However, there remains a gap in understanding and categorizing practices for engineering fairness throughout the machine learning lifecycle. This paper presents a novel catalog of practices for addressing fairness in machine learning derived from a systematic mapping study. The study identifies and categorizes 28 practices from existing literature, mapping them onto different stages of the machine learning lifecycle. From this catalog, the authors extract actionable items and implications for both researchers and practitioners in software engineering. This work aims to provide a comprehensive resource for integrating fairness considerations into the development and deployment of machine learning systems, enhancing their reliability, accountability, and credibility.

Via

Access Paper or Ask Questions

Illicit Darkweb Classification via Natural-language Processing: Classifying Illicit Content of Webpages based on Textual Information

Dec 08, 2023

Giuseppe Cascavilla, Gemma Catolino, Mirella Sangiovanni

Abstract:This work aims at expanding previous works done in the context of illegal activities classification, performing three different steps. First, we created a heterogeneous dataset of 113995 onion sites and dark marketplaces. Then, we compared pre-trained transferable models, i.e., ULMFit (Universal Language Model Fine-tuning), Bert (Bidirectional Encoder Representations from Transformers), and RoBERTa (Robustly optimized BERT approach) with a traditional text classification approach like LSTM (Long short-term memory) neural networks. Finally, we developed two illegal activities classification approaches, one for illicit content on the Dark Web and one for identifying the specific types of drugs. Results show that Bert obtained the best approach, classifying the dark web's general content and the types of Drugs with 96.08% and 91.98% of accuracy.

* In Proceedings of the 19th International Conference on Security and Cryptography - Volume 1: SECRYPT, 2022, ISBN 978-989-758-590-6, pages 620-626

Via

Access Paper or Ask Questions