Abstract:The rise of Large Language Models (LLMs), such as LLaMA and ChatGPT, has opened new opportunities for enhancing recommender systems through improved explainability. This paper provides a systematic literature review focused on leveraging LLMs to generate explanations for recommendations -- a critical aspect for fostering transparency and user trust. We conducted a comprehensive search within the ACM Guide to Computing Literature, covering publications from the launch of ChatGPT (November 2022) to the present (November 2024). Our search yielded 232 articles, but after applying inclusion criteria, only six were identified as directly addressing the use of LLMs in explaining recommendations. This scarcity highlights that, despite the rise of LLMs, their application in explainable recommender systems is still in an early stage. We analyze these select studies to understand current methodologies, identify challenges, and suggest directions for future research. Our findings underscore the potential of LLMs improving explanations of recommender systems and encourage the development of more transparent and user-centric recommendation explanation solutions.
Abstract:In the area of recommender systems, the vast majority of research efforts is spent on developing increasingly sophisticated recommendation models, also using increasingly more computational resources. Unfortunately, most of these research efforts target a very small set of application domains, mostly e-commerce and media recommendation. Furthermore, many of these models are never evaluated with users, let alone put into practice. The scientific, economic and societal value of much of these efforts by scholars therefore remains largely unclear. To achieve a stronger positive impact resulting from these efforts, we posit that we as a research community should more often address use cases where recommender systems contribute to societal good (RS4Good). In this opinion piece, we first discuss a number of examples where the use of recommender systems for problems of societal concern has been successfully explored in the literature. We then proceed by outlining a paradigmatic shift that is needed to conduct successful RS4Good research, where the key ingredients are interdisciplinary collaborations and longitudinal evaluation approaches with humans in the loop.
Abstract:Fairness in AI-driven decision-making systems has become a critical concern, especially when these systems directly affect human lives. This paper explores the public's comprehension of fairness in healthcare recommendations. We conducted a survey where participants selected from four fairness metrics -- Demographic Parity, Equal Accuracy, Equalized Odds, and Positive Predictive Value -- across different healthcare scenarios to assess their understanding of these concepts. Our findings reveal that fairness is a complex and often misunderstood concept, with a generally low level of public understanding regarding fairness metrics in recommender systems. This study highlights the need for enhanced information and education on algorithmic fairness to support informed decision-making in using these systems. Furthermore, the results suggest that a one-size-fits-all approach to fairness may be insufficient, pointing to the importance of context-sensitive designs in developing equitable AI systems.
Abstract:As global warming soars, evaluating the environmental impact of research is more critical now than ever before. However, we find that few to no recommender systems research papers document their impact on the environment. Consequently, in this paper, we conduct a comprehensive analysis of the environmental impact of recommender system research by reproducing a characteristic recommender systems experimental pipeline. We focus on estimating the carbon footprint of recommender systems research papers, highlighting the evolution of the environmental impact of recommender systems research experiments over time. We thoroughly evaluated all 79 full papers from the ACM RecSys conference in the years 2013 and 2023 to analyze representative experimental pipelines for papers utilizing traditional, so-called good old-fashioned AI algorithms and deep learning algorithms, respectively. We reproduced these representative experimental pipelines, measured electricity consumption using a hardware energy meter, and converted the measured energy consumption into CO2 equivalents to estimate the environmental impact. Our results show that a recommender systems research paper utilizing deep learning algorithms emits approximately 42 times more CO2 equivalents than a paper utilizing traditional algorithms. Furthermore, on average, such a paper produces 3,297 kilograms of CO2 equivalents, which is more than one person produces by flying from New York City to Melbourne or the amount one tree sequesters in 300 years.
Abstract:The tech industry has been criticised for designing applications that undermine individuals' autonomy. Recommender systems, in particular, have been identified as a suspected culprit that might exercise unwanted control over peoples' lives. In this article we try to assess the objectives of recommender system research and offer a nuanced discussion of how these objectives can align with users' goals. This discussion employs a qualitative literature survey connecting the dots between relevant research within the fields of psychology, design ethics, interaction design and recommender systems. Finally, we focus on the specific use-case of YouTube's recommender system and propose design changes that will better align with individuals' autonomy. Based on our analysis we offer directions for future research that will help secure rights to digital autonomy in the attention economy.
Abstract:Reproducibility is a key requirement for scientific progress. It allows the reproduction of the works of others, and, as a consequence, to fully trust the reported claims and results. In this work, we argue that, by facilitating reproducibility of recommender systems experimentation, we indirectly address the issues of accountability and transparency in recommender systems research from the perspectives of practitioners, designers, and engineers aiming to assess the capabilities of published research works. These issues have become increasingly prevalent in recent literature. Reasons for this include societal movements around intelligent systems and artificial intelligence striving towards fair and objective use of human behavioral data (as in Machine Learning, Information Retrieval, or Human-Computer Interaction). Society has grown to expect explanations and transparency standards regarding the underlying algorithms making automated decisions for and around us. This work surveys existing definitions of these concepts, and proposes a coherent terminology for recommender systems research, with the goal to connect reproducibility to accountability. We achieve this by introducing several guidelines and steps that lead to reproducible and, hence, accountable experimental workflows and research. We additionally analyze several instantiations of recommender system implementations available in the literature, and discuss the extent to which they fit in the introduced framework. With this work, we aim to shed light on this important problem, and facilitate progress in the field by increasing the accountability of research.