Abstract:In recent years, trace generation has emerged as a significant challenge within the Process Mining community. Deep Learning (DL) models have demonstrated accuracy in reproducing the features of the selected processes. However, current DL generative models are limited in their ability to adapt the learned distributions to generate data samples based on specific conditions or attributes. This limitation is particularly significant because the ability to control the type of generated data can be beneficial in various contexts, enabling a focus on specific behaviours, exploration of infrequent patterns, or simulation of alternative 'what-if' scenarios. In this work, we address this challenge by introducing a conditional model for process data generation based on a conditional variational autoencoder (CVAE). Conditional models offer control over the generation process by tuning input conditional variables, enabling more targeted and controlled data generation. Unlike other domains, CVAE for process mining faces specific challenges due to the multiperspective nature of the data and the need to adhere to control-flow rules while ensuring data variability. Specifically, we focus on generating process executions conditioned on control flow and temporal features of the trace, allowing us to produce traces for specific, identified sub-processes. The generated traces are then evaluated using common metrics for generative model assessment, along with additional metrics to evaluate the quality of the conditional generation
Abstract:Process Outcome Prediction entails predicting a discrete property of an unfinished process instance from its partial trace. High-capacity outcome predictors discovered with ensemble and deep learning methods have been shown to achieve top accuracy performances, but they suffer from a lack of transparency. Aligning with recent efforts to learn inherently interpretable outcome predictors, we propose to train a sparse Mixture-of-Experts where both the ``gate'' and ``expert'' sub-nets are Logistic Regressors. This ensemble-like model is trained end-to-end while automatically selecting a subset of input features in each sub-net, as an alternative to the common approach of performing a global feature selection step prior to model training. Test results on benchmark logs confirmed the validity and efficacy of this approach.