Abstract:Social Networking Sites (SNS) are one of the most important ways of communication. In particular, microblogging sites are being used as analysis avenues due to their peculiarities (promptness, short texts...). There are countless researches that use SNS in novel manners, but machine learning (ML) has focused mainly in classification performance rather than interpretability and/or other goodness metrics. Thus, state-of-the-art models are black boxes that should not be used to solve problems that may have a social impact. When the problem requires transparency, it is necessary to build interpretable pipelines. Arguably, the most decisive component in the pipeline is the classifier, but it is not the only thing that we need to consider. Despite that the classifier may be interpretable, resulting models are too complex to be considered comprehensible, making it impossible for humans to comprehend the actual decisions. The purpose of this paper is to present a feature selection mechanism (the first step in the pipeline) that is able to improve comprehensibility by using less but more meaningful features while achieving a good performance in microblogging contexts where interpretability is mandatory. Moreover, we present a ranking method to evaluate features in terms of statistical relevance and bias. We conducted exhaustive tests with five different datasets in order to evaluate classification performance, generalisation capacity and actual interpretability of the model. Our results shows that our proposal is better and, by far, the most stable in terms of accuracy, generalisation and comprehensibility.
Abstract:The aim of this paper is to present a method for identifying the structure of a rule in a fuzzy model. For this purpose, an ATMS shall be used (Zurita 1994). An algorithm obtaining the identification of the structure will be suggested (Castro 1995). The minimal structure of the rule (with respect to the number of variables that must appear in the rule) will be found by this algorithm. Furthermore, the identification parameters shall be obtained simultaneously. The proposed method shall be applied for classification to an example. The {em Iris Plant Database} shall be learnt for all three kinds of plants.