Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Bart Goethals

Weighted Tensor Decompositions for Context-aware Collaborative Filtering

Mar 11, 2025

Joey De Pauw, Bart Goethals

Abstract:Over recent years it has become well accepted that user interest is not static or immutable. There are a variety of contextual factors, such as time of day, the weather or the user's mood, that influence the current interests of the user. Modelling approaches need to take these factors into account if they want to succeed at finding the most relevant content to recommend given the situation. A popular method for context-aware recommendation is to encode context attributes as extra dimensions of the classic user-item interaction matrix, effectively turning it into a tensor, followed by applying the appropriate tensor decomposition methods to learn missing values. However, unlike with matrix factorization, where all decompositions are essentially a product of matrices, there exist many more options for decomposing tensors by combining vector, matrix and tensor products. We study the most successful decomposition methods that use weighted square loss and categorize them based on their tensor structure and regularization strategy. Additionally, we further extend the pool of methods by filling in the missing combinations. In this paper we provide an overview of the properties of the different decomposition methods, such as their complexity, scalability, and modelling capacity. These benefits are then contrasted with the performances achieved in offline experiments to gain more insight into which method to choose depending on a specific situation and constraints.

* Workshop on Context-Aware Recommender Systems, September 18, 2023, Singapore

Via

Access Paper or Ask Questions

Efficient pattern-based anomaly detection in a network of multivariate devices

May 07, 2023

Len Feremans, Boris Cule, Bart Goethals

Abstract:Many organisations manage service quality and monitor a large set devices and servers where each entity is associated with telemetry or physical sensor data series. Recently, various methods have been proposed to detect behavioural anomalies, however existing approaches focus on multivariate time series and ignore communication between entities. Moreover, we aim to support end-users in not only in locating entities and sensors causing an anomaly at a certain period, but also explain this decision. We propose a scalable approach to detect anomalies using a two-step approach. First, we recover relations between entities in the network, since relations are often dynamic in nature and caused by an unknown underlying process. Next, we report anomalies based on an embedding of sequential patterns. Pattern mining is efficient and supports interpretation, i.e. patterns represent frequent occurring behaviour in time series. We extend pattern mining to filter sequential patterns based on frequency, temporal constraints and minimum description length. We collect and release two public datasets for international broadcasting and X from an Internet company. \textit{BAD} achieves an overall F1-Score of 0.78 on 9 benchmark datasets, significantly outperforming the best baseline by 3\%. Additionally, \textit{BAD} is also an order-of-magnitude faster than state-of-the-art anomaly detection methods.

Via

Access Paper or Ask Questions

Modelling Users with Item Metadata for Explainable and Interactive Recommendation

Jul 08, 2022

Joey De Pauw, Koen Ruymbeek, Bart Goethals

Figure 1 for Modelling Users with Item Metadata for Explainable and Interactive Recommendation

Figure 2 for Modelling Users with Item Metadata for Explainable and Interactive Recommendation

Figure 3 for Modelling Users with Item Metadata for Explainable and Interactive Recommendation

Abstract:Recommender systems are used in many different applications and contexts, however their main goal can always be summarised as "connecting relevant content to interested users". Personalized recommendation algorithms achieve this goal by first building a profile of the user, either implicitly or explicitly, and then matching items with this profile to find relevant content. The more interpretable the profile and this "matching function" are, the easier it is to provide users with accurate and intuitive explanations, and also to let them interact with the system. Indeed, for a user to see what the system has already learned about her interests is of key importance for her to provide feedback to the system and to guide it towards better understanding her preferences. To this end, we propose a linear collaborative filtering recommendation model that builds user profiles within the domain of item metadata, which is arguably the most interpretable domain for end users. Our method is hence inherently transparent and explainable. Moreover, since recommendations are computed as a linear function of item metadata and the interpretable user profile, our method seamlessly supports interactive recommendation. In other words, users can directly tweak the weights of the learned profile for more fine-grained browsing and discovery of content based on their current interests. We demonstrate the interactive aspect of this model in an online application for discovering cultural events in Belgium. Additionally, the performance of the model is evaluated with offline experiments, both static and with simulated feedback, and compared to several state-of-the-art and state-of-practice baselines.

* - Correct author affiliation - Place appendix after references - Update link to source code

Via

Access Paper or Ask Questions

Proximity Forest: An effective and scalable distance-based classifier for time series

Aug 31, 2018

Benjamin Lucas, Ahmed Shifaz, Charlotte Pelletier, Lachlan O'Neill, Nayyar Zaidi, Bart Goethals, Francois Petitjean, Geoffrey I. Webb

Figure 1 for Proximity Forest: An effective and scalable distance-based classifier for time series

Figure 2 for Proximity Forest: An effective and scalable distance-based classifier for time series

Figure 3 for Proximity Forest: An effective and scalable distance-based classifier for time series

Figure 4 for Proximity Forest: An effective and scalable distance-based classifier for time series

Abstract:Research into the classification of time series has made enormous progress in the last decade. The UCR time series archive has played a significant role in challenging and guiding the development of new learners for time series classification. The largest dataset in the UCR archive holds 10 thousand time series only; which may explain why the primary research focus has been in creating algorithms that have high accuracy on relatively small datasets. This paper introduces Proximity Forest, an algorithm that learns accurate models from datasets with millions of time series, and classifies a time series in milliseconds. The models are ensembles of highly randomized Proximity Trees. Whereas conventional decision trees branch on attribute values (and usually perform poorly on time series), Proximity Trees branch on the proximity of time series to one exemplar time series or another; allowing us to leverage the decades of work into developing relevant measures for time series. Proximity Forest gains both efficiency and accuracy by stochastic selection of both exemplars and similarity measures. Our work is motivated by recent time series applications that provide orders of magnitude more time series than the UCR benchmarks. Our experiments demonstrate that Proximity Forest is highly competitive on the UCR archive: it ranks among the most accurate classifiers while being significantly faster. We demonstrate on a 1M time series Earth observation dataset that Proximity Forest retains this accuracy on datasets that are many orders of magnitude greater than those in the UCR repository, while learning its models at least 100,000 times faster than current state of the art models Elastic Ensemble and COTE.

* 30 pages, 12 figures

Via

Access Paper or Ask Questions

Understanding Concept Drift

Apr 02, 2017

Geoffrey I. Webb, Loong Kuan Lee, François Petitjean, Bart Goethals

Figure 1 for Understanding Concept Drift

Figure 2 for Understanding Concept Drift

Figure 3 for Understanding Concept Drift

Figure 4 for Understanding Concept Drift

Abstract:Concept drift is a major issue that greatly affects the accuracy and reliability of many real-world applications of machine learning. We argue that to tackle concept drift it is important to develop the capacity to describe and analyze it. We propose tools for this purpose, arguing for the importance of quantitative descriptions of drift in marginal distributions. We present quantitative drift analysis techniques along with methods for communicating their results. We demonstrate their effectiveness by application to three real-world learning tasks.

Via

Access Paper or Ask Questions

Interactive Constrained Association Rule Mining

Feb 05, 2003

Bart Goethals, Jan Van den Bussche

Figure 1 for Interactive Constrained Association Rule Mining

Figure 2 for Interactive Constrained Association Rule Mining

Figure 3 for Interactive Constrained Association Rule Mining

Figure 4 for Interactive Constrained Association Rule Mining

Abstract:We investigate ways to support interactive mining sessions, in the setting of association rule mining. In such sessions, users specify conditions (queries) on the associations to be generated. Our approach is a combination of the integration of querying conditions inside the mining phase, and the incremental querying of already generated associations. We present several concrete algorithms and compare their performance.

* A preliminary report on this work was presented at the Second International Conference on Knowledge Discovery and Data Mining (DaWaK 2000)

Via

Access Paper or Ask Questions

A Tight Upper Bound on the Number of Candidate Patterns

Nov 30, 2002

Floris Geerts, Bart Goethals, Jan Van den Bussche

Figure 1 for A Tight Upper Bound on the Number of Candidate Patterns

Figure 2 for A Tight Upper Bound on the Number of Candidate Patterns

Abstract:In the context of mining for frequent patterns using the standard levelwise algorithm, the following question arises: given the current level and the current set of frequent patterns, what is the maximal number of candidate patterns that can be generated on the next level? We answer this question by providing a tight upper bound, derived from a combinatorial result from the sixties by Kruskal and Katona. Our result is useful to reduce the number of database scans.

Via

Access Paper or Ask Questions

Relational Association Rules: getting WARMeR

Jun 15, 2002

Bart Goethals, Jan Van den Bussche

Figure 1 for Relational Association Rules: getting WARMeR

Figure 2 for Relational Association Rules: getting WARMeR

Figure 3 for Relational Association Rules: getting WARMeR

Abstract:In recent years, the problem of association rule mining in transactional data has been well studied. We propose to extend the discovery of classical association rules to the discovery of association rules of conjunctive queries in arbitrary relational data, inspired by the WARMR algorithm, developed by Dehaspe and Toivonen, that discovers association rules over a limited set of conjunctive queries. Conjunctive query evaluation in relational databases is well understood, but still poses some great challenges when approached from a discovery viewpoint in which patterns are generated and evaluated with respect to some well defined search space and pruning operators.

Via

Access Paper or Ask Questions

Mining All Non-Derivable Frequent Itemsets

Jun 03, 2002

Toon Calders, Bart Goethals

Figure 1 for Mining All Non-Derivable Frequent Itemsets

Figure 2 for Mining All Non-Derivable Frequent Itemsets

Figure 3 for Mining All Non-Derivable Frequent Itemsets

Figure 4 for Mining All Non-Derivable Frequent Itemsets

Abstract:Recent studies on frequent itemset mining algorithms resulted in significant performance improvements. However, if the minimal support threshold is set too low, or the data is highly correlated, the number of frequent itemsets itself can be prohibitively large. To overcome this problem, recently several proposals have been made to construct a concise representation of the frequent itemsets, instead of mining all frequent itemsets. The main goal of this paper is to identify redundancies in the set of all frequent itemsets and to exploit these redundancies in order to reduce the result of a mining operation. We present deduction rules to derive tight bounds on the support of candidate itemsets. We show how the deduction rules allow for constructing a minimal representation for all frequent itemsets. We also present connections between our proposal and recent proposals for concise representations and we give the results of experiments on real-life datasets that show the effectiveness of the deduction rules. In fact, the experiments even show that in many cases, first mining the concise representation, and then creating the frequent itemsets from this representation outperforms existing frequent set mining algorithms.

* 3 figures

Via

Access Paper or Ask Questions

A Data Mining Framework for Optimal Product Selection in Retail Supermarket Data: The Generalized PROFSET Model

Dec 11, 2001

Tom Brijs, Bart Goethals, Gilbert Swinnen, Koen Vanhoof, Geert Wets

Figure 1 for A Data Mining Framework for Optimal Product Selection in Retail Supermarket Data: The Generalized PROFSET Model

Figure 2 for A Data Mining Framework for Optimal Product Selection in Retail Supermarket Data: The Generalized PROFSET Model

Figure 3 for A Data Mining Framework for Optimal Product Selection in Retail Supermarket Data: The Generalized PROFSET Model

Abstract:In recent years, data mining researchers have developed efficient association rule algorithms for retail market basket analysis. Still, retailers often complain about how to adopt association rules to optimize concrete retail marketing-mix decisions. It is in this context that, in a previous paper, the authors have introduced a product selection model called PROFSET. This model selects the most interesting products from a product assortment based on their cross-selling potential given some retailer defined constraints. However this model suffered from an important deficiency: it could not deal effectively with supermarket data, and no provisions were taken to include retail category management principles. Therefore, in this paper, the authors present an important generalization of the existing model in order to make it suitable for supermarket data as well, and to enable retailers to add category restrictions to the model. Experiments on real world data obtained from a Belgian supermarket chain produce very promising results and demonstrate the effectiveness of the generalized PROFSET model.

Via

Access Paper or Ask Questions