Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mathias Kraus

Navigating the Rashomon Effect: How Personalization Can Help Adjust Interpretable Machine Learning Models to Individual Users

May 11, 2025

Julian Rosenberger, Philipp Schröppel, Sven Kruschel, Mathias Kraus, Patrick Zschech, Maximilian Förster

Abstract:The Rashomon effect describes the observation that in machine learning (ML) multiple models often achieve similar predictive performance while explaining the underlying relationships in different ways. This observation holds even for intrinsically interpretable models, such as Generalized Additive Models (GAMs), which offer users valuable insights into the model's behavior. Given the existence of multiple GAM configurations with similar predictive performance, a natural question is whether we can personalize these configurations based on users' needs for interpretability. In our study, we developed an approach to personalize models based on contextual bandits. In an online experiment with 108 users in a personalized treatment and a non-personalized control group, we found that personalization led to individualized rather than one-size-fits-all configurations. Despite these individual adjustments, the interpretability remained high across both groups, with users reporting a strong understanding of the models. Our research offers initial insights into the potential for personalizing interpretable ML.

* Accepted as a Completed Research Paper at the Thirty-Third European Conference on Information Systems (ECIS 2025), Amman, Jordan

Via

Access Paper or Ask Questions

Beware of "Explanations" of AI

Apr 09, 2025

David Martens, Galit Shmueli, Theodoros Evgeniou, Kevin Bauer, Christian Janiesch, Stefan Feuerriegel, Sebastian Gabel, Sofie Goethals, Travis Greene, Nadja Klein(+7 more)

Abstract:Understanding the decisions made and actions taken by increasingly complex AI system remains a key challenge. This has led to an expanding field of research in explainable artificial intelligence (XAI), highlighting the potential of explanations to enhance trust, support adoption, and meet regulatory standards. However, the question of what constitutes a "good" explanation is dependent on the goals, stakeholders, and context. At a high level, psychological insights such as the concept of mental model alignment can offer guidance, but success in practice is challenging due to social and technical factors. As a result of this ill-defined nature of the problem, explanations can be of poor quality (e.g. unfaithful, irrelevant, or incoherent), potentially leading to substantial risks. Instead of fostering trust and safety, poorly designed explanations can actually cause harm, including wrong decisions, privacy violations, manipulation, and even reduced AI adoption. Therefore, we caution stakeholders to beware of explanations of AI: while they can be vital, they are not automatically a remedy for transparency or responsible AI adoption, and their misuse or limitations can exacerbate harm. Attention to these caveats can help guide future research to improve the quality and impact of AI explanations.

* This work was inspired by Dagstuhl Seminar 24342

Via

Access Paper or Ask Questions

CareerBERT: Matching Resumes to ESCO Jobs in a Shared Embedding Space for Generic Job Recommendations

Mar 03, 2025

Julian Rosenberger, Lukas Wolfrum, Sven Weinzierl, Mathias Kraus, Patrick Zschech

Abstract:The rapidly evolving labor market, driven by technological advancements and economic shifts, presents significant challenges for traditional job matching and consultation services. In response, we introduce an advanced support tool for career counselors and job seekers based on CareerBERT, a novel approach that leverages the power of unstructured textual data sources, such as resumes, to provide more accurate and comprehensive job recommendations. In contrast to previous approaches that primarily focus on job recommendations based on a fixed set of concrete job advertisements, our approach involves the creation of a corpus that combines data from the European Skills, Competences, and Occupations (ESCO) taxonomy and EURopean Employment Services (EURES) job advertisements, ensuring an up-to-date and well-defined representation of general job titles in the labor market. Our two-step evaluation approach, consisting of an application-grounded evaluation using EURES job advertisements and a human-grounded evaluation using real-world resumes and Human Resources (HR) expert feedback, provides a comprehensive assessment of CareerBERT's performance. Our experimental results demonstrate that CareerBERT outperforms both traditional and state-of-the-art embedding approaches while showing robust effectiveness in human expert evaluations. These results confirm the effectiveness of CareerBERT in supporting career consultants by generating relevant job recommendations based on resumes, ultimately enhancing the efficiency of job consultations and expanding the perspectives of job seekers. This research contributes to the field of NLP and job recommendation systems, offering valuable insights for both researchers and practitioners in the domain of career consulting and job matching.

* Accepted at Expert Systems with Applications. In Press, see https://doi.org/10.1016/j.eswa.2025.127043

Via

Access Paper or Ask Questions

The Impact of Transparency in AI Systems on Users' Data-Sharing Intentions: A Scenario-Based Experiment

Feb 27, 2025

Julian Rosenberger, Sophie Kuhlemann, Verena Tiefenbeck, Mathias Kraus, Patrick Zschech

Abstract:Artificial Intelligence (AI) systems are frequently employed in online services to provide personalized experiences to users based on large collections of data. However, AI systems can be designed in different ways, with black-box AI systems appearing as complex data-processing engines and white-box AI systems appearing as fully transparent data-processors. As such, it is reasonable to assume that these different design choices also affect user perception and thus their willingness to share data. To this end, we conducted a pre-registered, scenario-based online experiment with 240 participants and investigated how transparent and non-transparent data-processing entities influenced data-sharing intentions. Surprisingly, our results revealed no significant difference in willingness to share data across entities, challenging the notion that transparency increases data-sharing willingness. Furthermore, we found that a general attitude of trust towards AI has a significant positive influence, especially in the transparent AI condition, whereas privacy concerns did not significantly affect data-sharing decisions.

* Wirtschaftsinformatik 2024 Proceedings. 38
* This is the author's version of the paper presented at the 19th International Conference on Wirtschaftsinformatik (WI 2024). The official published version is available at https://aisel.aisnet.org/wi2024/38/

Via

Access Paper or Ask Questions

Quantifying Visual Properties of GAM Shape Plots: Impact on Perceived Cognitive Load and Interpretability

Sep 25, 2024

Sven Kruschel, Lasse Bohlen, Julian Rosenberger, Patrick Zschech, Mathias Kraus

Figure 1 for Quantifying Visual Properties of GAM Shape Plots: Impact on Perceived Cognitive Load and Interpretability

Figure 2 for Quantifying Visual Properties of GAM Shape Plots: Impact on Perceived Cognitive Load and Interpretability

Figure 3 for Quantifying Visual Properties of GAM Shape Plots: Impact on Perceived Cognitive Load and Interpretability

Figure 4 for Quantifying Visual Properties of GAM Shape Plots: Impact on Perceived Cognitive Load and Interpretability

Abstract:Generalized Additive Models (GAMs) offer a balance between performance and interpretability in machine learning. The interpretability aspect of GAMs is expressed through shape plots, representing the model's decision-making process. However, the visual properties of these plots, e.g. number of kinks (number of local maxima and minima), can impact their complexity and the cognitive load imposed on the viewer, compromising interpretability. Our study, including 57 participants, investigates the relationship between the visual properties of GAM shape plots and cognitive load they induce. We quantify various visual properties of shape plots and evaluate their alignment with participants' perceived cognitive load, based on 144 plots. Our results indicate that the number of kinks metric is the most effective, explaining 86.4% of the variance in users' ratings. We develop a simple model based on number of kinks that provides a practical tool for predicting cognitive load, enabling the assessment of one aspect of GAM interpretability without direct user involvement.

* to be published in proceedings of the 58th Hawaii International Conference on System Sciences (HICSS)

Via

Access Paper or Ask Questions

Challenging the Performance-Interpretability Trade-off: An Evaluation of Interpretable Machine Learning Models

Sep 22, 2024

Sven Kruschel, Nico Hambauer, Sven Weinzierl, Sandra Zilker, Mathias Kraus, Patrick Zschech

Abstract:Machine learning is permeating every conceivable domain to promote data-driven decision support. The focus is often on advanced black-box models due to their assumed performance advantages, whereas interpretable models are often associated with inferior predictive qualities. More recently, however, a new generation of generalized additive models (GAMs) has been proposed that offer promising properties for capturing complex, non-linear patterns while remaining fully interpretable. To uncover the merits and limitations of these models, this study examines the predictive performance of seven different GAMs in comparison to seven commonly used machine learning models based on a collection of twenty tabular benchmark datasets. To ensure a fair and robust model comparison, an extensive hyperparameter search combined with cross-validation was performed, resulting in 68,500 model runs. In addition, this study qualitatively examines the visual output of the models to assess their level of interpretability. Based on these results, the paper dispels the misconception that only black-box models can achieve high accuracy by demonstrating that there is no strict trade-off between predictive performance and model interpretability for tabular data. Furthermore, the paper discusses the importance of GAMs as powerful interpretable models for the field of information systems and derives implications for future work from a socio-technical perspective.

* Accepted for publication in Business & Information Systems Engineering (2024)

Via

Access Paper or Ask Questions

IGANN Sparse: Bridging Sparsity and Interpretability with Non-linear Insight

Mar 17, 2024

Theodor Stoecker, Nico Hambauer, Patrick Zschech, Mathias Kraus

Figure 1 for IGANN Sparse: Bridging Sparsity and Interpretability with Non-linear Insight

Figure 2 for IGANN Sparse: Bridging Sparsity and Interpretability with Non-linear Insight

Figure 3 for IGANN Sparse: Bridging Sparsity and Interpretability with Non-linear Insight

Figure 4 for IGANN Sparse: Bridging Sparsity and Interpretability with Non-linear Insight

Abstract:Feature selection is a critical component in predictive analytics that significantly affects the prediction accuracy and interpretability of models. Intrinsic methods for feature selection are built directly into model learning, providing a fast and attractive option for large amounts of data. Machine learning algorithms, such as penalized regression models (e.g., lasso) are the most common choice when it comes to in-built feature selection. However, they fail to capture non-linear relationships, which ultimately affects their ability to predict outcomes in intricate datasets. In this paper, we propose IGANN Sparse, a novel machine learning model from the family of generalized additive models, which promotes sparsity through a non-linear feature selection process during training. This ensures interpretability through improved model sparsity without sacrificing predictive performance. Moreover, IGANN Sparse serves as an exploratory tool for information systems researchers to unveil important non-linear relationships in domains that are characterized by complex patterns. Our ongoing research is directed at a thorough evaluation of the IGANN Sparse model, including user studies that allow to assess how well users of the model can benefit from the reduced number of features. This will allow for a deeper understanding of the interactions between linear vs. non-linear modeling, number of selected features, and predictive performance.

* Preprint conditionally accepted for archival and presentation at the 32nd European Conference on Information Systems (ECIS 2024)

Via

Access Paper or Ask Questions

Towards Faithful and Robust LLM Specialists for Evidence-Based Question-Answering

Feb 26, 2024

Tobias Schimanski, Jingwei Ni, Mathias Kraus, Elliott Ash, Markus Leippold

Abstract:Advances towards more faithful and traceable answers of Large Language Models (LLMs) are crucial for various research and practical endeavors. One avenue in reaching this goal is basing the answers on reliable sources. However, this Evidence-Based QA has proven to work insufficiently with LLMs in terms of citing the correct sources (source quality) and truthfully representing the information within sources (answer attributability). In this work, we systematically investigate how to robustly fine-tune LLMs for better source quality and answer attributability. Specifically, we introduce a data generation pipeline with automated data quality filters, which can synthesize diversified high-quality training and testing data at scale. We further introduce four test sets to benchmark the robustness of fine-tuned specialist models. Extensive evaluation shows that fine-tuning on synthetic data improves performance on both in- and out-of-distribution. Furthermore, we show that data quality, which can be drastically improved by proposed quality filters, matters more than quantity in improving Evidence-Based QA.

Via

Access Paper or Ask Questions

A Globally Convergent Algorithm for Neural Network Parameter Optimization Based on Difference-of-Convex Functions

Jan 15, 2024

Daniel Tschernutter, Mathias Kraus, Stefan Feuerriegel

Abstract:We propose an algorithm for optimizing the parameters of single hidden layer neural networks. Specifically, we derive a blockwise difference-of-convex (DC) functions representation of the objective function. Based on the latter, we propose a block coordinate descent (BCD) approach that we combine with a tailored difference-of-convex functions algorithm (DCA). We prove global convergence of the proposed algorithm. Furthermore, we mathematically analyze the convergence rate of parameters and the convergence rate in value (i.e., the training loss). We give conditions under which our algorithm converges linearly or even faster depending on the local shape of the loss function. We confirm our theoretical derivations numerically and compare our algorithm against state-of-the-art gradient-based solvers in terms of both training loss and test loss.

* accepted by TMLR

Via

Access Paper or Ask Questions

ClimateBERT-NetZero: Detecting and Assessing Net Zero and Reduction Targets

Oct 12, 2023

Tobias Schimanski, Julia Bingler, Camilla Hyslop, Mathias Kraus, Markus Leippold

Figure 1 for ClimateBERT-NetZero: Detecting and Assessing Net Zero and Reduction Targets

Figure 2 for ClimateBERT-NetZero: Detecting and Assessing Net Zero and Reduction Targets

Figure 3 for ClimateBERT-NetZero: Detecting and Assessing Net Zero and Reduction Targets

Figure 4 for ClimateBERT-NetZero: Detecting and Assessing Net Zero and Reduction Targets

Abstract:Public and private actors struggle to assess the vast amounts of information about sustainability commitments made by various institutions. To address this problem, we create a novel tool for automatically detecting corporate, national, and regional net zero and reduction targets in three steps. First, we introduce an expert-annotated data set with 3.5K text samples. Second, we train and release ClimateBERT-NetZero, a natural language classifier to detect whether a text contains a net zero or reduction target. Third, we showcase its analysis potential with two use cases: We first demonstrate how ClimateBERT-NetZero can be combined with conventional question-answering (Q&A) models to analyze the ambitions displayed in net zero and reduction targets. Furthermore, we employ the ClimateBERT-NetZero model on quarterly earning call transcripts and outline how communication patterns evolve over time. Our experiments demonstrate promising pathways for extracting and analyzing net zero and emission reduction targets at scale.

Via

Access Paper or Ask Questions