Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Aichetou Bouchareb

SAMM

Co-clustering based exploratory analysis of mixed-type data tables

Dec 22, 2022

Aichetou Bouchareb, Marc Boullé, Fabrice Clérot, Fabrice Rossi

Abstract:Co-clustering is a class of unsupervised data analysis techniques that extract the existing underlying dependency structure between the instances and variables of a data table as homogeneous blocks. Most of those techniques are limited to variables of the same type. In this paper, we propose a mixed data co-clustering method based on a two-step methodology. In the first step, all the variables are binarized according to a number of bins chosen by the analyst, by equal frequency discretization in the numerical case, or keeping the most frequent values in the categorical case. The second step applies a co-clustering to the instances and the binary variables, leading to groups of instances and groups of variable parts. We apply this methodology on several data sets and compare with the results of a Multiple Correspondence Analysis applied to the same data.

* Advances in Knowledge Discovery and Management, 834, Springer International Publishing, pp.23-41, 2019, Studies in Computational Intelligence

Via

Access Paper or Ask Questions

Model Based Co-clustering of Mixed Numerical and Binary Data

Dec 22, 2022

Aichetou Bouchareb, Marc Boullé, Fabrice Clérot, Fabrice Rossi

Abstract:Co-clustering is a data mining technique used to extract the underlying block structure between the rows and columns of a data matrix. Many approaches have been studied and have shown their capacity to extract such structures in continuous, binary or contingency tables. However, very little work has been done to perform co-clustering on mixed type data. In this article, we extend the latent block models based co-clustering to the case of mixed data (continuous and binary variables). We then evaluate the effectiveness of the proposed approach on simulated data and we discuss its advantages and potential limits.

* Advances in Knowledge Discovery and Management, 834, Springer International Publishing, pp.3-22, 2019, Studies in Computational Intelligence

Via

Access Paper or Ask Questions

Un modèle Bayésien de co-clustering de données mixtes

Feb 06, 2019

Aichetou Bouchareb, Marc Boullé, Fabrice Rossi, Fabrice Clérot

Figure 1 for Un modèle Bayésien de co-clustering de données mixtes

Abstract:We propose a MAP Bayesian approach to perform and evaluate a co-clustering of mixed-type data tables. The proposed model infers an optimal segmentation of all variables then performs a co-clustering by minimizing a Bayesian model selection cost function. One advantage of this approach is that it is user parameter-free. Another main advantage is the proposed criterion which gives an exact measure of the model quality, measured by probability of fitting it to the data. Continuous optimization of this criterion ensures finding better and better models while avoiding data over-fitting. The experiments conducted on real data show the interest of this co-clustering approach in exploratory data analysis of large data sets.

* Extraction et gestion des connaissances 2018, Jan 2018, Paris, France. Revue des Nouvelles Technologies de l'Information, RNTI-E-34, pp.275-280, 2018, Actes de la 18{\`e}eme Conf{\'e}rence Internationale Francophone sur l'Extraction et gestion des connaissances (EGC'2018)
* in French

Via

Access Paper or Ask Questions