Abstract:Process simulation is an analysis tool in process mining that allows users to measure the impact of changes, prevent losses, and update the process without risks or costs. In the literature, several process simulation techniques are available and they are usually built upon process models discovered from a given event log or learned via deep learning. Each group of approaches has its own strengths and limitations. The former is usually restricted to the control-flow but it is more interpretable, whereas the latter is not interpretable by nature but has a greater generalization capability on large event logs. Despite the great performance achieved by deep learning approaches, they are still not suitable to be applied to real scenarios and generate value for users. This issue is mainly due to fact their stochasticity is hard to control. To address this problem, we propose the CoSMo framework for implementing process simulation models fully based on deep learning. This framework enables simulating event logs that satisfy a constraint by conditioning the learning phase of a deep neural network. Throughout experiments, the simulation is validated from both control-flow and data-flow perspectives, demonstrating the proposed framework's capability of simulating cases while satisfying imposed conditions.
Abstract:Encoding methods are employed across several process mining tasks, including predictive process monitoring, anomalous case detection, trace clustering, etc. These methods are usually performed as preprocessing steps and are responsible for transforming complex information into a numerical feature space. Most papers choose existing encoding methods arbitrarily or employ a strategy based on a specific expert knowledge domain. Moreover, existing methods are employed by using their default hyperparameters without evaluating other options. This practice can lead to several drawbacks, such as suboptimal performance and unfair comparisons with the state-of-the-art. Therefore, this work aims at providing a comprehensive survey on event log encoding by comparing 27 methods, from different natures, in terms of expressivity, scalability, correlation, and domain agnosticism. To the best of our knowledge, this is the most comprehensive study so far focusing on trace encoding in process mining. It contributes to maturing awareness about the role of trace encoding in process mining pipelines and sheds light on issues, concerns, and future research directions regarding the use of encoding methods to bridge the gap between machine learning models and process mining.