INRIA Lille - Nord Europe, LIFL
Abstract:Biclustering is an unsupervised machine-learning approach aiming to cluster rows and columns simultaneously in a data matrix. Several biclustering algorithms have been proposed for handling numeric datasets. However, real-world data mining problems often involve heterogeneous datasets with mixed attributes. To address this challenge, we introduce a biclustering approach called HBIC, capable of discovering meaningful biclusters in complex heterogeneous data, including numeric, binary, and categorical data. The approach comprises two stages: bicluster generation and bicluster model selection. In the initial stage, several candidate biclusters are generated iteratively by adding and removing rows and columns based on the frequency of values in the original matrix. In the second stage, we introduce two approaches for selecting the most suitable biclusters by considering their size and homogeneity. Through a series of experiments, we investigated the suitability of our approach on a synthetic benchmark and in a biomedical application involving clinical data of systemic sclerosis patients. The evaluation comparing our method to existing approaches demonstrates its ability to discover high-quality biclusters from heterogeneous data. Our biclustering approach is a starting point for heterogeneous bicluster discovery, leading to a better understanding of complex underlying data structures.
Abstract:Biclustering is an unsupervised machine learning technique that simultaneously clusters rows and columns in a data matrix. Biclustering has emerged as an important approach and plays an essential role in various applications such as bioinformatics, text mining, and pattern recognition. However, finding significant biclusters is an NP-hard problem that can be formulated as an optimization problem. Therefore, different metaheuristics have been applied to biclustering problems because of their exploratory capability of solving complex optimization problems in reasonable computation time. Although various surveys on biclustering have been proposed, there is a lack of a comprehensive survey on the biclustering problem using metaheuristics. This chapter will present a survey of metaheuristics approaches to address the biclustering problem. The review focuses on the underlying optimization methods and their main search components: representation, objective function, and variation operators. A specific discussion on single versus multi-objective approaches is presented. Finally, some emerging research directions are presented.
Abstract:The no-wait flowshop scheduling problem is a variant of the classical permutation flowshop problem, with the additional constraint that jobs have to be processed by the successive machines without waiting time. To efficiently address this NP-hard combinatorial optimization problem we conduct an analysis of the structure of good quality solutions. This analysis shows that the No-Wait specificity gives them a common structure: they share identical sub-sequences of jobs, we call super-jobs. After a discussion on the way to identify these super-jobs, we propose IG-SJ, an algorithm that exploits super-jobs within the state-of-the-art algorithm for the classical permutation flowshop, the well-known Iterated Greedy (IG) algorithm. An iterative approach of IG-SJ is also proposed. Experiments are conducted on Taillard's instances. The experimental results show that exploiting super-jobs is successful since IG-SJ is able to find 64 new best solutions.
Abstract:In multiobjective combinatorial optimization, there exists two main classes of metaheuristics, based either on multiple aggregations, or on a dominance relation. As in the single objective case, the structure of the search space can explain the difficulty for multiobjective metaheuristics, and guide the design of such methods. In this work we analyze the properties of multiobjective combinatorial search spaces. In particular, we focus on the features related the efficient set, and we pay a particular attention to the correlation between objectives. Few benchmark takes such objective correlation into account. Here, we define a general method to design multiobjective problems with correlation. As an example, we extend the well-known multiobjective NK-landscapes. By measuring different properties of the search space, we show the importance of considering the objective correlation on the design of metaheuristics.
Abstract:Solving efficiently complex problems using metaheuristics, and in particular local searches, requires incorporating knowledge about the problem to solve. In this paper, the permutation flowshop problem is studied. It is well known that in such problems, several solutions may have the same fitness value. As this neutrality property is an important one, it should be taken into account during the design of optimization methods. Then in the context of the permutation flowshop, a deep landscape analysis focused on the neutrality property is driven and propositions on the way to use this neutrality to guide efficiently the search are given.
Abstract:Recently, the property of connectedness has been claimed to give a strong motivation on the design of local search techniques for multiobjective combinatorial optimization (MOCO). Indeed, when connectedness holds, a basic Pareto local search, initialized with at least one non-dominated solution, allows to identify the efficient set exhaustively. However, this becomes quickly infeasible in practice as the number of efficient solutions typically grows exponentially with the instance size. As a consequence, we generally have to deal with a limited-size approximation, where a good sample set has to be found. In this paper, we propose the biobjective multiple and long path problems to show experimentally that, on the first problems, even if the efficient set is connected, a local search may be outperformed by a simple evolutionary algorithm in the sampling of the efficient set. At the opposite, on the second problems, a local search algorithm may successfully approximate a disconnected efficient set. Then, we argue that connectedness is not the single property to study for the design of local search heuristics for MOCO. This work opens new discussions on a proper definition of the multiobjective fitness landscape.
Abstract:VEGAS (Varying Evolvability-Guided Adaptive Search) is a new methodology proposed to deal with the neutrality property of some optimization problems. ts main feature is to consider the whole neutral network rather than an arbitrary solution. Moreover, VEGAS is designed to escape from plateaus based on the evolvability of solution and a multi-armed bandit. Experiments are conducted on NK-landscapes with neutrality. Results show the importance of considering the whole neutral network and of guiding the search cleverly. The impact of the level of neutrality and of the exploration-exploitation trade-off are deeply analyzed.
Abstract:In this paper, we conduct a fitness landscape analysis for multiobjective combinatorial optimization, based on the local optima of multiobjective NK-landscapes with objective correlation. In single-objective optimization, it has become clear that local optima have a strong impact on the performance of metaheuristics. Here, we propose an extension to the multiobjective case, based on the Pareto dominance. We study the co-influence of the problem dimension, the degree of non-linearity, the number of objectives and the correlation degree between objective functions on the number of Pareto local optima.
Abstract:Fitness landscape analysis aims to understand the geometry of a given optimization problem in order to design more efficient search algorithms. However, there is a very little knowledge on the landscape of multiobjective problems. In this work, following a recent proposal by Zitzler et al. (2010), we consider multiobjective optimization as a set problem. Then, we give a general definition of set-based multiobjective fitness landscapes. An experimental set-based fitness landscape analysis is conducted on the multiobjective NK-landscapes with objective correlation. The aim is to adapt and to enhance the comprehensive design of set-based multiobjective search approaches, motivated by an a priori analysis of the corresponding set problem properties.
Abstract:This paper presents a new methodology that exploits specific characteristics from the fitness landscape. In particular, we are interested in the property of neutrality, that deals with the fact that the same fitness value is assigned to numerous solutions from the search space. Many combinatorial optimization problems share this property, that is generally very inhibiting for local search algorithms. A neutrality-based iterated local search, that allows neutral walks to move on the plateaus, is proposed and experimented on a permutation flowshop scheduling problem with the aim of minimizing the makespan. Our experiments show that the proposed approach is able to find improving solutions compared with a classical iterated local search. Moreover, the tradeoff between the exploitation of neutrality and the exploration of new parts of the search space is deeply analyzed.