Abstract:Archetypal analysis (AA) was originally proposed in 1994 by Adele Cutler and Leo Breiman as a computational procedure to extract the distinct aspects called archetypes in observations with each observational record approximated as a mixture (i.e., convex combination) of these archetypes. AA thereby provides straightforward, interpretable, and explainable representations for feature extraction and dimensionality reduction, facilitating the understanding of the structure of high-dimensional data with wide applications throughout the sciences. However, AA also faces challenges, particularly as the associated optimization problem is non-convex. This survey provides researchers and data mining practitioners an overview of methodologies and opportunities that AA has to offer surveying the many applications of AA across disparate fields of science, as well as best practices for modeling data using AA and limitations. The survey concludes by explaining important future research directions concerning AA.
Abstract:A new exploratory technique called biarchetype analysis is defined. We extend archetype analysis to find the archetypes of both observations and features simultaneously. The idea of this new unsupervised machine learning tool is to represent observations and features by instances of pure types (biarchetypes) that can be easily interpreted as they are mixtures of observations and features. Furthermore, the observations and features are expressed as mixtures of the biarchetypes, which also helps understand the structure of the data. We propose an algorithm to solve biarchetype analysis. We show that biarchetype analysis offers advantages over biclustering, especially in terms of interpretability. This is because byarchetypes are extreme instances as opposed to the centroids returned by biclustering, which favors human understanding. Biarchetype analysis is applied to several machine learning problems to illustrate its usefulness.
Abstract:The aim of ordinal classification is to predict the ordered labels of the output from a set of observed inputs. Interval-valued data refers to data in the form of intervals. For the first time, interval-valued data and interval-valued functional data are considered as inputs in an ordinal classification problem. Six ordinal classifiers for interval data and interval-valued functional data are proposed. Three of them are parametric, one of them is based on ordinal binary decompositions and the other two are based on ordered logistic regression. The other three methods are based on the use of distances between interval data and kernels on interval data. One of the methods uses the weighted $k$-nearest-neighbor technique for ordinal classification. Another method considers kernel principal component analysis plus an ordinal classifier. And the sixth method, which is the method that performs best, uses a kernel-induced ordinal random forest. They are compared with na\"ive approaches in an extensive experimental study with synthetic and original real data sets, about human global development, and weather data. The results show that considering ordering and interval-valued information improves the accuracy. The source code and data sets are available at https://github.com/aleixalcacer/OCFIVD.