Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Jochen De Weerdt

Language Fusion for Parameter-Efficient Cross-lingual Transfer

Jan 12, 2025

Philipp Borchert, Ivan Vulić, Marie-Francine Moens, Jochen De Weerdt

Abstract:Limited availability of multilingual text corpora for training language models often leads to poor performance on downstream tasks due to undertrained representation spaces for languages other than English. This 'under-representation' has motivated recent cross-lingual transfer methods to leverage the English representation space by e.g. mixing English and 'non-English' tokens at the input level or extending model parameters to accommodate new languages. However, these approaches often come at the cost of increased computational complexity. We propose Fusion forLanguage Representations (FLARE) in adapters, a novel method that enhances representation quality and downstream performance for languages other than English while maintaining parameter efficiency. FLARE integrates source and target language representations within low-rank (LoRA) adapters using lightweight linear transformations, maintaining parameter efficiency while improving transfer performance. A series of experiments across representative cross-lingual natural language understanding tasks, including natural language inference, question-answering and sentiment analysis, demonstrate FLARE's effectiveness. FLARE achieves performance improvements of 4.9% for Llama 3.1 and 2.2% for Gemma~2 compared to standard LoRA fine-tuning on question-answering tasks, as measured by the exact match metric.

* 20 pages

Via

Access Paper or Ask Questions

Generating Realistic Adversarial Examples for Business Processes using Variational Autoencoders

Nov 21, 2024

Alexander Stevens, Jari Peeperkorn, Johannes De Smedt, Jochen De Weerdt

Figure 1 for Generating Realistic Adversarial Examples for Business Processes using Variational Autoencoders

Figure 2 for Generating Realistic Adversarial Examples for Business Processes using Variational Autoencoders

Figure 3 for Generating Realistic Adversarial Examples for Business Processes using Variational Autoencoders

Figure 4 for Generating Realistic Adversarial Examples for Business Processes using Variational Autoencoders

Abstract:In predictive process monitoring, predictive models are vulnerable to adversarial attacks, where input perturbations can lead to incorrect predictions. Unlike in computer vision, where these perturbations are designed to be imperceptible to the human eye, the generation of adversarial examples in predictive process monitoring poses unique challenges. Minor changes to the activity sequences can create improbable or even impossible scenarios to occur due to underlying constraints such as regulatory rules or process constraints. To address this, we focus on generating realistic adversarial examples tailored to the business process context, in contrast to the imperceptible, pixel-level changes commonly seen in computer vision adversarial attacks. This paper introduces two novel latent space attacks, which generate adversaries by adding noise to the latent space representation of the input data, rather than directly modifying the input attributes. These latent space methods are domain-agnostic and do not rely on process-specific knowledge, as we restrict the generation of adversarial examples to the learned class-specific data distributions by directly perturbing the latent space representation of the business process executions. We evaluate these two latent space methods with six other adversarial attacking methods on eleven real-life event logs and four predictive models. The first three attacking methods directly permute the activities of the historically observed business process executions. The fourth method constrains the adversarial examples to lie within the same data distribution as the original instances, by projecting the adversarial examples to the original data distribution.

Via

Access Paper or Ask Questions

Native Design Bias: Studying the Impact of English Nativeness on Language Model Performance

Jun 25, 2024

Manon Reusens, Philipp Borchert, Jochen De Weerdt, Bart Baesens

Abstract:Large Language Models (LLMs) excel at providing information acquired during pretraining on large-scale corpora and following instructions through user prompts. This study investigates whether the quality of LLM responses varies depending on the demographic profile of users. Considering English as the global lingua franca, along with the diversity of its dialects among speakers of different native languages, we explore whether non-native English speakers receive lower-quality or even factually incorrect responses from LLMs more frequently. Our results show that performance discrepancies occur when LLMs are prompted by native versus non-native English speakers and persist when comparing native speakers from Western countries with others. Additionally, we find a strong anchoring effect when the model recognizes or is made aware of the user's nativeness, which further degrades the response quality when interacting with non-native speakers. Our analysis is based on a newly collected dataset with over 12,000 unique annotations from 124 annotators, including information on their native language and English proficiency.

Via

Access Paper or Ask Questions

Efficient Information Extraction in Few-Shot Relation Classification through Contrastive Representation Learning

Mar 25, 2024

Philipp Borchert, Jochen De Weerdt, Marie-Francine Moens

Abstract:Differentiating relationships between entity pairs with limited labeled instances poses a significant challenge in few-shot relation classification. Representations of textual data extract rich information spanning the domain, entities, and relations. In this paper, we introduce a novel approach to enhance information extraction combining multiple sentence representations and contrastive learning. While representations in relation classification are commonly extracted using entity marker tokens, we argue that substantial information within the internal model representations remains untapped. To address this, we propose aligning multiple sentence representations, such as the [CLS] token, the [MASK] token used in prompting, and entity marker tokens. Our method employs contrastive learning to extract complementary discriminative information from these individual representations. This is particularly relevant in low-resource settings where information is scarce. Leveraging multiple sentence representations is especially effective in distilling discriminative information for relation classification when additional information, like relation descriptions, are not available. We validate the adaptability of our approach, maintaining robust performance in scenarios that include relation descriptions, and showcasing its flexibility to adapt to different resource constraints.

* NAACL 2024

Via

Access Paper or Ask Questions

CORE: A Few-Shot Company Relation Classification Dataset for Robust Domain Adaptation

Oct 18, 2023

Philipp Borchert, Jochen De Weerdt, Kristof Coussement, Arno De Caigny, Marie-Francine Moens

Abstract:We introduce CORE, a dataset for few-shot relation classification (RC) focused on company relations and business entities. CORE includes 4,708 instances of 12 relation types with corresponding textual evidence extracted from company Wikipedia pages. Company names and business entities pose a challenge for few-shot RC models due to the rich and diverse information associated with them. For example, a company name may represent the legal entity, products, people, or business divisions depending on the context. Therefore, deriving the relation type between entities is highly dependent on textual context. To evaluate the performance of state-of-the-art RC models on the CORE dataset, we conduct experiments in the few-shot domain adaptation setting. Our results reveal substantial performance gaps, confirming that models trained on different domains struggle to adapt to CORE. Interestingly, we find that models trained on CORE showcase improved out-of-domain performance, which highlights the importance of high-quality data for robust domain adaptation. Specifically, the information richness embedded in business entities allows models to focus on contextual nuances, reducing their reliance on superficial clues such as relation-specific verbs. In addition to the dataset, we provide relevant code snippets to facilitate reproducibility and encourage further research in the field.

* Accepted to EMNLP 2023 main conference

Via

Access Paper or Ask Questions

Investigating Bias in Multilingual Language Models: Cross-Lingual Transfer of Debiasing Techniques

Oct 16, 2023

Manon Reusens, Philipp Borchert, Margot Mieskes, Jochen De Weerdt, Bart Baesens

Figure 1 for Investigating Bias in Multilingual Language Models: Cross-Lingual Transfer of Debiasing Techniques

Figure 2 for Investigating Bias in Multilingual Language Models: Cross-Lingual Transfer of Debiasing Techniques

Figure 3 for Investigating Bias in Multilingual Language Models: Cross-Lingual Transfer of Debiasing Techniques

Figure 4 for Investigating Bias in Multilingual Language Models: Cross-Lingual Transfer of Debiasing Techniques

Abstract:This paper investigates the transferability of debiasing techniques across different languages within multilingual models. We examine the applicability of these techniques in English, French, German, and Dutch. Using multilingual BERT (mBERT), we demonstrate that cross-lingual transfer of debiasing techniques is not only feasible but also yields promising results. Surprisingly, our findings reveal no performance disadvantages when applying these techniques to non-English languages. Using translations of the CrowS-Pairs dataset, our analysis identifies SentenceDebias as the best technique across different languages, reducing bias in mBERT by an average of 13%. We also find that debiasing techniques with additional pretraining exhibit enhanced cross-lingual effectiveness for the languages included in the analyses, particularly in lower-resource languages. These novel insights contribute to a deeper understanding of bias mitigation in multilingual language models and provide practical guidance for debiasing techniques in different language contexts.

* Accepted to EMNLP 2023 main conference

Via

Access Paper or Ask Questions

Timing Process Interventions with Causal Inference and Reinforcement Learning

Jun 07, 2023

Hans Weytjens, Wouter Verbeke, Jochen De Weerdt

Abstract:The shift from the understanding and prediction of processes to their optimization offers great benefits to businesses and other organizations. Precisely timed process interventions are the cornerstones of effective optimization. Prescriptive process monitoring (PresPM) is the sub-field of process mining that concentrates on process optimization. The emerging PresPM literature identifies state-of-the-art methods, causal inference (CI) and reinforcement learning (RL), without presenting a quantitative comparison. Most experiments are carried out using historical data, causing problems with the accuracy of the methods' evaluations and preempting online RL. Our contribution consists of experiments on timed process interventions with synthetic data that renders genuine online RL and the comparison to CI possible, and allows for an accurate evaluation of the results. Our experiments reveal that RL's policies outperform those from CI and are more robust at the same time. Indeed, the RL policies approach perfect policies. Unlike CI, the unaltered online RL approach can be applied to other, more generic PresPM problems such as next best activity recommendations. Nonetheless, CI has its merits in settings where online learning is not an option.

Via

Access Paper or Ask Questions

Can recurrent neural networks learn process model structure?

Dec 13, 2022

Jari Peeperkorn, Seppe vanden Broucke, Jochen De Weerdt

Abstract:Various methods using machine and deep learning have been proposed to tackle different tasks in predictive process monitoring, forecasting for an ongoing case e.g. the most likely next event or suffix, its remaining time, or an outcome-related variable. Recurrent neural networks (RNNs), and more specifically long short-term memory nets (LSTMs), stand out in terms of popularity. In this work, we investigate the capabilities of such an LSTM to actually learn the underlying process model structure of an event log. We introduce an evaluation framework that combines variant-based resampling and custom metrics for fitness, precision and generalization. We evaluate 4 hypotheses concerning the learning capabilities of LSTMs, the effect of overfitting countermeasures, the level of incompleteness in the training set and the level of parallelism in the underlying process model. We confirm that LSTMs can struggle to learn process model structure, even with simplistic process data and in a very lenient setup. Taking the correct anti-overfitting measures can alleviate the problem. However, these measures did not present themselves to be optimal when selecting hyperparameters purely on predicting accuracy. We also found that decreasing the amount of information seen by the LSTM during training, causes a sharp drop in generalization and precision scores. In our experiments, we could not identify a relationship between the extent of parallelism in the model and the generalization capability, but they do indicate that the process' complexity might have impact.

* Journal of Intelligent Information Systems 2022

Via

Access Paper or Ask Questions

Predicting student performance using sequence classification with time-based windows

Aug 16, 2022

Galina Deeva, Johannes De Smedt, Cecilia Saint-Pierre, Richard Weber, Jochen De Weerdt

Figure 1 for Predicting student performance using sequence classification with time-based windows

Figure 2 for Predicting student performance using sequence classification with time-based windows

Figure 3 for Predicting student performance using sequence classification with time-based windows

Figure 4 for Predicting student performance using sequence classification with time-based windows

Abstract:A growing number of universities worldwide use various forms of online and blended learning as part of their academic curricula. Furthermore, the recent changes caused by the COVID-19 pandemic have led to a drastic increase in importance and ubiquity of online education. Among the major advantages of e-learning is not only improving students' learning experience and widening their educational prospects, but also an opportunity to gain insights into students' learning processes with learning analytics. This study contributes to the topic of improving and understanding e-learning processes in the following ways. First, we demonstrate that accurate predictive models can be built based on sequential patterns derived from students' behavioral data, which are able to identify underperforming students early in the course. Second, we investigate the specificity-generalizability trade-off in building such predictive models by investigating whether predictive models should be built for every course individually based on course-specific sequential patterns, or across several courses based on more general behavioral patterns. Finally, we present a methodology for capturing temporal aspects in behavioral data and analyze its influence on the predictive performance of the models. The results of our improved sequence classification technique are capable to predict student performance with high levels of accuracy, reaching 90 percent for course-specific models.

* Expert Systems with Applications, 118182 (2022)

Via

Access Paper or Ask Questions

Enhancing Stochastic Petri Net-based Remaining Time Prediction using k-Nearest Neighbors

Jun 27, 2022

Jarne Vandenabeele, Gilles Vermaut, Jari Peeperkorn, Jochen De Weerdt

Figure 1 for Enhancing Stochastic Petri Net-based Remaining Time Prediction using k-Nearest Neighbors

Figure 2 for Enhancing Stochastic Petri Net-based Remaining Time Prediction using k-Nearest Neighbors

Figure 3 for Enhancing Stochastic Petri Net-based Remaining Time Prediction using k-Nearest Neighbors

Figure 4 for Enhancing Stochastic Petri Net-based Remaining Time Prediction using k-Nearest Neighbors

Abstract:Reliable remaining time prediction of ongoing business processes is a highly relevant topic. One example is order delivery, a key competitive factor in e.g. retailing as it is a main driver of customer satisfaction. For realising timely delivery, an accurate prediction of the remaining time of the delivery process is crucial. Within the field of process mining, a wide variety of remaining time prediction techniques have already been proposed. In this work, we extend remaining time prediction based on stochastic Petri nets with generally distributed transitions with k-nearest neighbors. The k-nearest neighbors algorithm is performed on simple vectors storing the time passed to complete previous activities. By only taking a subset of instances, a more representative and stable stochastic Petri Net is obtained, leading to more accurate time predictions. We discuss the technique and its basic implementation in Python and use different real world data sets to evaluate the predictive power of our extension. These experiments show clear advantages in combining both techniques with regard to predictive power.

* ALGORITHMS & THEORIES FOR THE ANALYSIS OF EVENT DATA 2022

Via

Access Paper or Ask Questions