Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Carlos Castillo

Disparate Model Performance and Stability in Machine Learning Clinical Support for Diabetes and Heart Diseases

Dec 27, 2024

Ioannis Bilionis, Ricardo C. Berrios, Luis Fernandez-Luque, Carlos Castillo

Figure 1 for Disparate Model Performance and Stability in Machine Learning Clinical Support for Diabetes and Heart Diseases

Figure 2 for Disparate Model Performance and Stability in Machine Learning Clinical Support for Diabetes and Heart Diseases

Figure 3 for Disparate Model Performance and Stability in Machine Learning Clinical Support for Diabetes and Heart Diseases

Figure 4 for Disparate Model Performance and Stability in Machine Learning Clinical Support for Diabetes and Heart Diseases

Abstract:Machine Learning (ML) algorithms are vital for supporting clinical decision-making in biomedical informatics. However, their predictive performance can vary across demographic groups, often due to the underrepresentation of historically marginalized populations in training datasets. The investigation reveals widespread sex- and age-related inequities in chronic disease datasets and their derived ML models. Thus, a novel analytical framework is introduced, combining systematic arbitrariness with traditional metrics like accuracy and data complexity. The analysis of data from over 25,000 individuals with chronic diseases revealed mild sex-related disparities, favoring predictive accuracy for males, and significant age-related differences, with better accuracy for younger patients. Notably, older patients showed inconsistent predictive accuracy across seven datasets, linked to higher data complexity and lower model performance. This highlights that representativeness in training data alone does not guarantee equitable outcomes, and model arbitrariness must be addressed before deploying models in clinical settings.

* This paper will be presented in American Medical Informatics Association (AMIA) Informatics Summit Conference 2025 (Pittsburgh, PA). 10 pages, 2 figures, 5 tables

Via

Access Paper or Ask Questions

Understanding and Addressing Gender Bias in Expert Finding Task

Jul 07, 2024

Maddalena Amendola, Carlos Castillo, Andrea Passarella, Raffaele Perego

Figure 1 for Understanding and Addressing Gender Bias in Expert Finding Task

Figure 2 for Understanding and Addressing Gender Bias in Expert Finding Task

Figure 3 for Understanding and Addressing Gender Bias in Expert Finding Task

Figure 4 for Understanding and Addressing Gender Bias in Expert Finding Task

Abstract:The Expert Finding (EF) task is critical in community Question&Answer (CQ&A) platforms, significantly enhancing user engagement by improving answer quality and reducing response times. However, biases, especially gender biases, have been identified in these platforms. This study investigates gender bias in state-of-the-art EF models and explores methods to mitigate it. Utilizing a comprehensive dataset from StackOverflow, the largest community in the StackExchange network, we conduct extensive experiments to analyze how EF models' candidate identification processes influence gender representation. Our findings reveal that models relying on reputation metrics and activity levels disproportionately favor male users, who are more active on the platform. This bias results in the underrepresentation of female experts in the ranking process. We propose adjustments to EF models that incorporate a more balanced preprocessing strategy and leverage content-based and social network-based information, with the aim to provide a fairer representation of genders among identified experts. Our analysis shows that integrating these methods can significantly enhance gender balance without compromising model accuracy. To the best of our knowledge, this study is the first to focus on detecting and mitigating gender bias in EF methods.

Via

Access Paper or Ask Questions

Responsible AI Research Needs Impact Statements Too

Nov 20, 2023

Alexandra Olteanu, Michael Ekstrand, Carlos Castillo, Jina Suh

Abstract:All types of research, development, and policy work can have unintended, adverse consequences - work in responsible artificial intelligence (RAI), ethical AI, or ethics in AI is no exception.

Via

Access Paper or Ask Questions

Disparity, Inequality, and Accuracy Tradeoffs in Graph Neural Networks for Node Classification

Aug 18, 2023

Arpit Merchant, Carlos Castillo

Figure 1 for Disparity, Inequality, and Accuracy Tradeoffs in Graph Neural Networks for Node Classification

Figure 2 for Disparity, Inequality, and Accuracy Tradeoffs in Graph Neural Networks for Node Classification

Figure 3 for Disparity, Inequality, and Accuracy Tradeoffs in Graph Neural Networks for Node Classification

Figure 4 for Disparity, Inequality, and Accuracy Tradeoffs in Graph Neural Networks for Node Classification

Abstract:Graph neural networks (GNNs) are increasingly used in critical human applications for predicting node labels in attributed graphs. Their ability to aggregate features from nodes' neighbors for accurate classification also has the capacity to exacerbate existing biases in data or to introduce new ones towards members from protected demographic groups. Thus, it is imperative to quantify how GNNs may be biased and to what extent their harmful effects may be mitigated. To this end, we propose two new GNN-agnostic interventions namely, (i) PFR-AX which decreases the separability between nodes in protected and non-protected groups, and (ii) PostProcess which updates model predictions based on a blackbox policy to minimize differences between error rates across demographic groups. Through a large set of experiments on four datasets, we frame the efficacies of our approaches (and three variants) in terms of their algorithmic fairness-accuracy tradeoff and benchmark our results against three strong baseline interventions on three state-of-the-art GNN models. Our results show that no single intervention offers a universally optimal tradeoff, but PFR-AX and PostProcess provide granular control and improve model confidence when correctly predicting positive outcomes for nodes in protected groups.

* Accepted to CIKM 2023

Via

Access Paper or Ask Questions

Fairness and Diversity in Information Access Systems

May 16, 2023

Lorenzo Porcaro, Carlos Castillo, Emilia Gómez, João Vinagre

Abstract:Among the seven key requirements to achieve trustworthy AI proposed by the High-Level Expert Group on Artificial Intelligence (AI-HLEG) established by the European Commission (EC), the fifth requirement ("Diversity, non-discrimination and fairness") declares: "In order to achieve Trustworthy AI, we must enable inclusion and diversity throughout the entire AI system's life cycle. [...] This requirement is closely linked with the principle of fairness". In this paper, we try to shed light on how closely these two distinct concepts, diversity and fairness, may be treated by focusing on information access systems and ranking literature. These concepts should not be used interchangeably because they do represent two different values, but what we argue is that they also cannot be considered totally unrelated or divergent. Having diversity does not imply fairness, but fostering diversity can effectively lead to fair outcomes, an intuition behind several methods proposed to mitigate the disparate impact of information access systems, i.e. recommender systems and search engines.

* Presented at the European Workshop on Algorithmic Fairness (EWAF'23) Winterthur, Switzerland, June 7-9, 2023

Via

Access Paper or Ask Questions

Assessing the Impact of Music Recommendation Diversity on Listeners: A Longitudinal Study

Dec 01, 2022

Lorenzo Porcaro, Emilia Gómez, Carlos Castillo

Figure 1 for Assessing the Impact of Music Recommendation Diversity on Listeners: A Longitudinal Study

Figure 2 for Assessing the Impact of Music Recommendation Diversity on Listeners: A Longitudinal Study

Figure 3 for Assessing the Impact of Music Recommendation Diversity on Listeners: A Longitudinal Study

Figure 4 for Assessing the Impact of Music Recommendation Diversity on Listeners: A Longitudinal Study

Abstract:We present the results of a 12-week longitudinal user study wherein the participants, 110 subjects from Southern Europe, received on a daily basis Electronic Music (EM) diversified recommendations. By analyzing their explicit and implicit feedback, we show that exposure to specific levels of music recommendation diversity may be responsible for long-term impacts on listeners' attitudes. In particular, we highlight the function of diversity in increasing the openness in listening to EM, a music genre not particularly known or liked by the participants previous to their participation in the study. Moreover, we demonstrate that recommendations may help listeners in removing positive and negative attachments towards EM, deconstructing pre-existing implicit associations but also stereotypes associated with this music. In addition, our results show the significant clout that recommendation diversity has in generating curiosity in listeners.

Via

Access Paper or Ask Questions

Cross-Lingual Query-Based Summarization of Crisis-Related Social Media: An Abstractive Approach Using Transformers

Apr 21, 2022

Fedor Vitiugin, Carlos Castillo

Figure 1 for Cross-Lingual Query-Based Summarization of Crisis-Related Social Media: An Abstractive Approach Using Transformers

Figure 2 for Cross-Lingual Query-Based Summarization of Crisis-Related Social Media: An Abstractive Approach Using Transformers

Figure 3 for Cross-Lingual Query-Based Summarization of Crisis-Related Social Media: An Abstractive Approach Using Transformers

Figure 4 for Cross-Lingual Query-Based Summarization of Crisis-Related Social Media: An Abstractive Approach Using Transformers

Abstract:Relevant and timely information collected from social media during crises can be an invaluable resource for emergency management. However, extracting this information remains a challenging task, particularly when dealing with social media postings in multiple languages. This work proposes a cross-lingual method for retrieving and summarizing crisis-relevant information from social media postings. We describe a uniform way of expressing various information needs through structured queries and a way of creating summaries answering those information needs. The method is based on multilingual transformers embeddings. Queries are written in one of the languages supported by the embeddings, and the extracted sentences can be in any of the other languages supported. Abstractive summaries are created by transformers. The evaluation, done by crowdsourcing evaluators and emergency management experts, and carried out on collections extracted from Twitter during five large-scale disasters spanning ten languages, shows the flexibility of our approach. The generated summaries are regarded as more focused, structured, and coherent than existing state-of-the-art methods, and experts compare them favorably against summaries created by existing, state-of-the-art methods.

Via

Access Paper or Ask Questions

Human Response to an AI-Based Decision Support System: A User Study on the Effects of Accuracy and Bias

Mar 24, 2022

David Solans, Andrea Beretta, Manuel Portela, Carlos Castillo, Anna Monreale

Figure 1 for Human Response to an AI-Based Decision Support System: A User Study on the Effects of Accuracy and Bias

Figure 2 for Human Response to an AI-Based Decision Support System: A User Study on the Effects of Accuracy and Bias

Figure 3 for Human Response to an AI-Based Decision Support System: A User Study on the Effects of Accuracy and Bias

Figure 4 for Human Response to an AI-Based Decision Support System: A User Study on the Effects of Accuracy and Bias

Abstract:Artificial Intelligence (AI) is increasingly used to build Decision Support Systems (DSS) across many domains. This paper describes a series of experiments designed to observe human response to different characteristics of a DSS such as accuracy and bias, particularly the extent to which participants rely on the DSS, and the performance they achieve. In our experiments, participants play a simple online game inspired by so-called "wildcat" (i.e., exploratory) drilling for oil. The landscape has two layers: a visible layer describing the costs (terrain), and a hidden layer describing the reward (oil yield). Participants in the control group play the game without receiving any assistance, while in treatment groups they are assisted by a DSS suggesting places to drill. For certain treatments, the DSS does not consider costs, but only rewards, which introduces a bias that is observable by users. Between subjects, we vary the accuracy and bias of the DSS, and observe the participants' total score, time to completion, the extent to which they follow or ignore suggestions. We also measure the acceptability of the DSS in an exit survey. Our results show that participants tend to score better with the DSS, that the score increase is due to users following the DSS advice, and related to the difficulty of the game and the accuracy of the DSS. We observe that this setting elicits mostly rational behavior from participants, who place a moderate amount of trust in the DSS and show neither algorithmic aversion (under-reliance) nor automation bias (over-reliance).However, their stated willingness to accept the DSS in the exit survey seems less sensitive to the accuracy of the DSS than their behavior, suggesting that users are only partially aware of the (lack of) accuracy of the DSS.

Via

Access Paper or Ask Questions

Rewiring What-to-Watch-Next Recommendations to Reduce Radicalization Pathways

Feb 01, 2022

Francesco Fabbri, Yanhao Wang, Francesco Bonchi, Carlos Castillo, Michael Mathioudakis

Figure 1 for Rewiring What-to-Watch-Next Recommendations to Reduce Radicalization Pathways

Figure 2 for Rewiring What-to-Watch-Next Recommendations to Reduce Radicalization Pathways

Figure 3 for Rewiring What-to-Watch-Next Recommendations to Reduce Radicalization Pathways

Figure 4 for Rewiring What-to-Watch-Next Recommendations to Reduce Radicalization Pathways

Abstract:Recommender systems typically suggest to users content similar to what they consumed in the past. If a user happens to be exposed to strongly polarized content, she might subsequently receive recommendations which may steer her towards more and more radicalized content, eventually being trapped in what we call a "radicalization pathway". In this paper, we study the problem of mitigating radicalization pathways using a graph-based approach. Specifically, we model the set of recommendations of a "what-to-watch-next" recommender as a d-regular directed graph where nodes correspond to content items, links to recommendations, and paths to possible user sessions. We measure the "segregation" score of a node representing radicalized content as the expected length of a random walk from that node to any node representing non-radicalized content. High segregation scores are associated to larger chances to get users trapped in radicalization pathways. Hence, we define the problem of reducing the prevalence of radicalization pathways by selecting a small number of edges to "rewire", so to minimize the maximum of segregation scores among all radicalized nodes, while maintaining the relevance of the recommendations. We prove that the problem of finding the optimal set of recommendations to rewire is NP-hard and NP-hard to approximate within any factor. Therefore, we turn our attention to heuristics, and propose an efficient yet effective greedy algorithm based on the absorbing random walk theory. Our experiments on real-world datasets in the context of video and news recommendations confirm the effectiveness of our proposal.

* To appear in the Web conference 2022 (WWW '22)

Via

Access Paper or Ask Questions

Diversity in the Music Listening Experience: Insights from Focus Group Interviews

Jan 25, 2022

Lorenzo Porcaro, Emilia Gómez, Carlos Castillo

Abstract:Music listening in today's digital spaces is highly characterized by the availability of huge music catalogues, accessible by people all over the world. In this scenario, recommender systems are designed to guide listeners in finding tracks and artists that best fit their requests, having therefore the power to influence the diversity of the music they listen to. Albeit several works have proposed new techniques for developing diversity-aware recommendations, little is known about how people perceive diversity while interacting with music recommendations. In this study, we interview several listeners about the role that diversity plays in their listening experience, trying to get a better understanding of how they interact with music recommendations. We recruit the listeners among the participants of a previous quantitative study, where they were confronted with the notion of diversity when asked to identify, from a series of electronic music lists, the most diverse ones according to their beliefs. As a follow-up, in this qualitative study we carry out semi-structured interviews to understand how listeners may assess the diversity of a music list and to investigate their experiences with music recommendation diversity. We report here our main findings on 1) what can influence the diversity assessment of tracks and artists' music lists, and 2) which factors can characterize listeners' interaction with music recommendation diversity.

Via

Access Paper or Ask Questions