Abstract:Process Discovery is concerned with the automatic generation of a process model that describes a business process from execution data of that business process. Real life event logs can contain chaotic activities. These activities are independent of the state of the process and can, therefore, happen at rather arbitrary points in time. We show that the presence of such chaotic activities in an event log heavily impacts the quality of the process models that can be discovered with process discovery techniques. The current modus operandi for filtering activities from event logs is to simply filter out infrequent activities. We show that frequency-based filtering of activities does not solve the problems that are caused by chaotic activities. Moreover, we propose a novel technique to filter out chaotic activities from event logs. We evaluate this technique on a collection of seventeen real-life event logs that originate from both the business process management domain and the smart home environment domain. As demonstrated, the developed activity filtering methods enable the discovery of process models that are more behaviorally specific compared to process models that are discovered using standard frequency-based filtering.
Abstract:Process mining is a research field focused on the analysis of event data with the aim of extracting insights related to dynamic behavior. Applying process mining techniques on data from smart home environments has the potential to provide valuable insights in (un)healthy habits and to contribute to ambient assisted living solutions. Finding the right event labels to enable the application of process mining techniques is however far from trivial, as simply using the triggering sensor as the label for sensor events results in uninformative models that allow for too much behavior (overgeneralizing). Refinements of sensor level event labels suggested by domain experts have been shown to enable discovery of more precise and insightful process models. However, there exists no automated approach to generate refinements of event labels in the context of process mining. In this paper we propose a framework for the automated generation of label refinements based on the time attribute of events, allowing us to distinguish behaviourally different instances of the same event type based on their time attribute. We show on a case study with real life smart home event data that using automatically generated refined labels in process discovery, we can find more specific, and therefore more insightful, process models. We observe that one label refinement could have an effect on the usefulness of other label refinements when used together. Therefore, we explore four strategies to generate useful combinations of multiple label refinements and evaluate those on three real life smart home event logs.
Abstract:Artifact-centric process models aim to describe complex processes as a collection of interacting artifacts. Recent development in process mining allow for the discovery of such models. However, the focus is often on the representation of the individual artifacts rather than their interactions. Based on event data we can automatically discover composite state machines representing artifact-centric processes. Moreover, we provide ways of visualizing and quantifying interactions among different artifacts. For example, we are able to highlight strongly correlated behaviours in different artifacts. The approach has been fully implemented as a ProM plug-in; the CSM Miner provides an interactive artifact-centric process discovery tool focussing on interactions. The approach has been evaluated using real life data sets, including the personal loan and overdraft process of a Dutch financial institution.
Abstract:Process mining techniques focus on extracting insight in processes from event logs. Process mining has the potential to provide valuable insights in (un)healthy habits and to contribute to ambient assisted living solutions when applied on data from smart home environments. However, events recorded in smart home environments are on the level of sensor triggers, at which process discovery algorithms produce overgeneralizing process models that allow for too much behavior and that are difficult to interpret for human experts. We show that abstracting the events to a higher-level interpretation can enable discovery of more precise and more comprehensible models. We present a framework for the extraction of features that can be used for abstraction with supervised learning methods that is based on the XES IEEE standard for event logs. This framework can automatically abstract sensor-level events to their interpretation at the human activity level, after training it on training data for which both the sensor and human activity events are known. We demonstrate our abstraction framework on three real-life smart home event logs and show that the process models that can be discovered after abstraction are more precise indeed.
Abstract:In process mining, precision measures are used to quantify how much a process model overapproximates the behavior seen in an event log. Although several measures have been proposed throughout the years, no research has been done to validate whether these measures achieve the intended aim of quantifying over-approximation in a consistent way for all models and logs. This paper fills this gap by postulating a number of axioms for quantifying precision consistently for any log and any model. Further, we show through counter-examples that none of the existing measures consistently quantifies precision.
Abstract:In this paper we describe a method to discover frequent behavioral patterns in event logs. We express these patterns as \emph{local process models}. Local process model mining can be positioned in-between process discovery and episode / sequential pattern mining. The technique presented in this paper is able to learn behavioral patterns involving sequential composition, concurrency, choice and loop, like in process mining. However, we do not look at start-to-end models, which distinguishes our approach from process discovery and creates a link to episode / sequential pattern mining. We propose an incremental procedure for building local process models capturing frequent patterns based on so-called process trees. We propose five quality dimensions and corresponding metrics for local process models, given an event log. We show monotonicity properties for some quality dimensions, enabling a speedup of local process model discovery through pruning. We demonstrate through a real life case study that mining local patterns allows us to get insights in processes where regular start-to-end process discovery techniques are only able to learn unstructured, flower-like, models.
Abstract:Local Process Models (LPM) describe structured fragments of process behavior occurring in the context of less structured business processes. Traditional LPM discovery aims to generate a collection of process models that describe highly frequent behavior, but these models do not always provide useful answers for questions posed by process analysts aiming at business process improvement. We propose a framework for goal-driven LPM discovery, based on utility functions and constraints. We describe four scopes on which these utility functions and constrains can be defined, and show that utility functions and constraints on different scopes can be combined to form composite utility functions/constraints. Finally, we demonstrate the applicability of our approach by presenting several actionable business insights discovered with LPM discovery on two real life data sets.
Abstract:Local Process Model (LPM) discovery is focused on the mining of a set of process models where each model describes the behavior represented in the event log only partially, i.e. subsets of possible events are taken into account to create so-called local process models. Often such smaller models provide valuable insights into the behavior of the process, especially when no adequate and comprehensible single overall process model exists that is able to describe the traces of the process from start to end. The practical application of LPM discovery is however hindered by computational issues in the case of logs with many activities (problems may already occur when there are more than 17 unique activities). In this paper, we explore three heuristics to discover subsets of activities that lead to useful log projections with the goal of speeding up LPM discovery considerably while still finding high-quality LPMs. We found that a Markov clustering approach to create projection sets results in the largest improvement of execution time, with discovered LPMs still being better than with the use of randomly generated activity sets of the same size. Another heuristic, based on log entropy, yields a more moderate speedup, but enables the discovery of higher quality LPMs. The third heuristic, based on the relative information gain, shows unstable performance: for some data sets the speedup and LPM quality are higher than with the log entropy based method, while for other data sets there is no speedup at all.
Abstract:Process mining is a research field focused on the analysis of event data with the aim of extracting insights in processes. Applying process mining techniques on data from smart home environments has the potential to provide valuable insights in (un)healthy habits and to contribute to ambient assisted living solutions. Finding the right event labels to enable application of process mining techniques is however far from trivial, as simply using the triggering sensor as the label for sensor events results in uninformative models that allow for too much behavior (overgeneralizing). Refinements of sensor level event labels suggested by domain experts have shown to enable discovery of more precise and insightful process models. However, there exist no automated approach to generate refinements of event labels in the context of process mining. In this paper we propose a framework for automated generation of label refinements based on the time attribute of events. We show on a case study with real life smart home event data that behaviorally more specific, and therefore more insightful, process models can be found by using automatically generated refined labels in process discovery.
Abstract:Process mining techniques focus on extracting insight in processes from event logs. In many cases, events recorded in the event log are too fine-grained, causing process discovery algorithms to discover incomprehensible process models or process models that are not representative of the event log. We show that when process discovery algorithms are only able to discover an unrepresentative process model from a low-level event log, structure in the process can in some cases still be discovered by first abstracting the event log to a higher level of granularity. This gives rise to the challenge to bridge the gap between an original low-level event log and a desired high-level perspective on this log, such that a more structured or more comprehensible process model can be discovered. We show that supervised learning can be leveraged for the event abstraction task when annotations with high-level interpretations of the low-level events are available for a subset of the sequences (i.e., traces). We present a method to generate feature vector representations of events based on XES extensions, and describe an approach to abstract events in an event log with Condition Random Fields using these event features. Furthermore, we propose a sequence-focused metric to evaluate supervised event abstraction results that fits closely to the tasks of process discovery and conformance checking. We conclude this paper by demonstrating the usefulness of supervised event abstraction for obtaining more structured and/or more comprehensible process models using both real life event data and synthetic event data.