Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Pietro Coretto

Nonparametric consistency for maximum likelihood estimation and clustering based on mixtures of elliptically-symmetric distributions

Nov 10, 2023

Pietro Coretto, Christian Hennig

Abstract:The consistency of the maximum likelihood estimator for mixtures of elliptically-symmetric distributions for estimating its population version is shown, where the underlying distribution $P$ is nonparametric and does not necessarily belong to the class of mixtures on which the estimator is based. In a situation where $P$ is a mixture of well enough separated but nonparametric distributions it is shown that the components of the population version of the estimator correspond to the well separated components of $P$. This provides some theoretical justification for the use of such estimators for cluster analysis in case that $P$ has well separated subpopulations even if these subpopulations differ from what the mixture model assumes.

Via

Access Paper or Ask Questions

Selecting the number of clusters, clustering models, and algorithms. A unifying approach based on the quadratic discriminant score

Nov 03, 2021

Luca Coraggio, Pietro Coretto

Figure 1 for Selecting the number of clusters, clustering models, and algorithms. A unifying approach based on the quadratic discriminant score

Figure 2 for Selecting the number of clusters, clustering models, and algorithms. A unifying approach based on the quadratic discriminant score

Figure 3 for Selecting the number of clusters, clustering models, and algorithms. A unifying approach based on the quadratic discriminant score

Figure 4 for Selecting the number of clusters, clustering models, and algorithms. A unifying approach based on the quadratic discriminant score

Abstract:Cluster analysis requires many decisions: the clustering method and the implied reference model, the number of clusters and, often, several hyper-parameters and algorithms' tunings. In practice, one produces several partitions, and a final one is chosen based on validation or selection criteria. There exist an abundance of validation methods that, implicitly or explicitly, assume a certain clustering notion. Moreover, they are often restricted to operate on partitions obtained from a specific method. In this paper, we focus on groups that can be well separated by quadratic or linear boundaries. The reference cluster concept is defined through the quadratic discriminant score function and parameters describing clusters' size, center and scatter. We develop two cluster-quality criteria called quadratic scores. We show that these criteria are consistent with groups generated from a general class of elliptically-symmetric distributions. The quest for this type of groups is common in applications. The connection with likelihood theory for mixture models and model-based clustering is investigated. Based on bootstrap resampling of the quadratic scores, we propose a selection rule that allows choosing among many clustering solutions. The proposed method has the distinctive advantage that it can compare partitions that cannot be compared with other state-of-the-art methods. Extensive numerical experiments and the analysis of real data show that, even if some competing methods turn out to be superior in some setups, the proposed methodology achieves a better overall performance.

* Supplemental materials are included at the end of the paper

Via

Access Paper or Ask Questions