Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

George D. C. Cavalcanti

Multi-view autoencoders for Fake News Detection

Apr 10, 2025

Ingryd V. S. T. Pereira, George D. C. Cavalcanti, Rafael M. O. Cruz

Abstract:Given the volume and speed at which fake news spreads across social media, automatic fake news detection has become a highly important task. However, this task presents several challenges, including extracting textual features that contain relevant information about fake news. Research about fake news detection shows that no single feature extraction technique consistently outperforms the others across all scenarios. Nevertheless, different feature extraction techniques can provide complementary information about the textual data and enable a more comprehensive representation of the content. This paper proposes using multi-view autoencoders to generate a joint feature representation for fake news detection by integrating several feature extraction techniques commonly used in the literature. Experiments on fake news datasets show a significant improvement in classification performance compared to individual views (feature representations). We also observed that selecting a subset of the views instead of composing a latent space with all the views can be advantageous in terms of accuracy and computational effort. For further details, including source codes, figures, and datasets, please refer to the project's repository: https://github.com/ingrydpereira/multiview-fake-news.

* Accepted by IEEE Symposium Series on Computational Intelligence - IEEE SSCI 2025

Via

Access Paper or Ask Questions

A post-selection algorithm for improving dynamic ensemble selection methods

Sep 26, 2023

Paulo R. G. Cordeiro, George D. C. Cavalcanti, Rafael M. O. Cruz

Figure 1 for A post-selection algorithm for improving dynamic ensemble selection methods

Figure 2 for A post-selection algorithm for improving dynamic ensemble selection methods

Figure 3 for A post-selection algorithm for improving dynamic ensemble selection methods

Figure 4 for A post-selection algorithm for improving dynamic ensemble selection methods

Abstract:Dynamic Ensemble Selection (DES) is a Multiple Classifier Systems (MCS) approach that aims to select an ensemble for each query sample during the selection phase. Even with the proposal of several DES approaches, no particular DES technique is the best choice for different problems. Thus, we hypothesize that selecting the best DES approach per query instance can lead to better accuracy. To evaluate this idea, we introduce the Post-Selection Dynamic Ensemble Selection (PS-DES) approach, a post-selection scheme that evaluates ensembles selected by several DES techniques using different metrics. Experimental results show that using accuracy as a metric to select the ensembles, PS-DES performs better than individual DES techniques. PS-DES source code is available in a GitHub repository

* 2023 IEEE International Conference on Systems, Man, and Cybernetics (SMC)

Via

Access Paper or Ask Questions

The choice of scaling technique matters for classification performance

Dec 23, 2022

Lucas B. V. de Amorim, George D. C. Cavalcanti, Rafael M. O. Cruz

Abstract:Dataset scaling, also known as normalization, is an essential preprocessing step in a machine learning pipeline. It is aimed at adjusting attributes scales in a way that they all vary within the same range. This transformation is known to improve the performance of classification models, but there are several scaling techniques to choose from, and this choice is not generally done carefully. In this paper, we execute a broad experiment comparing the impact of 5 scaling techniques on the performances of 20 classification algorithms among monolithic and ensemble models, applying them to 82 publicly available datasets with varying imbalance ratios. Results show that the choice of scaling technique matters for classification performance, and the performance difference between the best and the worst scaling technique is relevant and statistically significant in most cases. They also indicate that choosing an inadequate technique can be more detrimental to classification performance than not scaling the data at all. We also show how the performance variation of an ensemble model, considering different scaling techniques, tends to be dictated by that of its base model. Finally, we discuss the relationship between a model's sensitivity to the choice of scaling technique and its performance and provide insights into its applicability on different model deployment scenarios. Full results and source code for the experiments in this paper are available in a GitHub repository.\footnote{https://github.com/amorimlb/scaling\_matters}

* Applied Soft Computing, Volume 133, January 2023, 109924

Via

Access Paper or Ask Questions

Local overlap reduction procedure for dynamic ensemble selection

Jun 16, 2022

Mariana A. Souza, Robert Sabourin, George D. C. Cavalcanti, Rafael M. O. Cruz

Figure 1 for Local overlap reduction procedure for dynamic ensemble selection

Figure 2 for Local overlap reduction procedure for dynamic ensemble selection

Figure 3 for Local overlap reduction procedure for dynamic ensemble selection

Figure 4 for Local overlap reduction procedure for dynamic ensemble selection

Abstract:Class imbalance is a characteristic known for making learning more challenging for classification models as they may end up biased towards the majority class. A promising approach among the ensemble-based methods in the context of imbalance learning is Dynamic Selection (DS). DS techniques single out a subset of the classifiers in the ensemble to label each given unknown sample according to their estimated competence in the area surrounding the query. Because only a small region is taken into account in the selection scheme, the global class disproportion may have less impact over the system's performance. However, the presence of local class overlap may severely hinder the DS techniques' performance over imbalanced distributions as it not only exacerbates the effects of the under-representation but also introduces ambiguous and possibly unreliable samples to the competence estimation process. Thus, in this work, we propose a DS technique which attempts to minimize the effects of the local class overlap during the classifier selection procedure. The proposed method iteratively removes from the target region the instance perceived as the hardest to classify until a classifier is deemed competent to label the query sample. The known samples are characterized using instance hardness measures that quantify the local class overlap. Experimental results show that the proposed technique can significantly outperform the baseline as well as several other DS techniques, suggesting its suitability for dealing with class under-representation and overlap. Furthermore, the proposed technique still yielded competitive results when using an under-sampled, less overlapped version of the labelled sets, specially over the problems with a high proportion of minority class samples in overlap areas. Code available at https://github.com/marianaasouza/lords.

* Paper accepted to the 2022 International Joint Conference on Neural Networks

Via

Access Paper or Ask Questions

Selecting and combining complementary feature representations and classifiers for hate speech detection

Jan 18, 2022

Rafael M. O. Cruz, Woshington V. de Sousa, George D. C. Cavalcanti

Figure 1 for Selecting and combining complementary feature representations and classifiers for hate speech detection

Figure 2 for Selecting and combining complementary feature representations and classifiers for hate speech detection

Figure 3 for Selecting and combining complementary feature representations and classifiers for hate speech detection

Figure 4 for Selecting and combining complementary feature representations and classifiers for hate speech detection

Abstract:Hate speech is a major issue in social networks due to the high volume of data generated daily. Recent works demonstrate the usefulness of machine learning (ML) in dealing with the nuances required to distinguish between hateful posts from just sarcasm or offensive language. Many ML solutions for hate speech detection have been proposed by either changing how features are extracted from the text or the classification algorithm employed. However, most works consider only one type of feature extraction and classification algorithm. This work argues that a combination of multiple feature extraction techniques and different classification models is needed. We propose a framework to analyze the relationship between multiple feature extraction and classification techniques to understand how they complement each other. The framework is used to select a subset of complementary techniques to compose a robust multiple classifiers system (MCS) for hate speech detection. The experimental study considering four hate speech classification datasets demonstrates that the proposed framework is a promising methodology for analyzing and designing high-performing MCS for this task. MCS system obtained using the proposed framework significantly outperforms the combination of all models and the homogeneous and heterogeneous selection heuristics, demonstrating the importance of having a proper selection scheme. Source code, figures, and dataset splits can be found in the GitHub repository: https://github.com/Menelau/Hate-Speech-MCS.

* acceped for publication on the Online Social Networks and Media (OSNEM) journal

Via

Access Paper or Ask Questions

Label noise detection under the Noise at Random model with ensemble filters

Dec 02, 2021

Kecia G. Moura, Ricardo B. C. Prudêncio, George D. C. Cavalcanti

Figure 1 for Label noise detection under the Noise at Random model with ensemble filters

Figure 2 for Label noise detection under the Noise at Random model with ensemble filters

Figure 3 for Label noise detection under the Noise at Random model with ensemble filters

Figure 4 for Label noise detection under the Noise at Random model with ensemble filters

Abstract:Label noise detection has been widely studied in Machine Learning because of its importance in improving training data quality. Satisfactory noise detection has been achieved by adopting ensembles of classifiers. In this approach, an instance is assigned as mislabeled if a high proportion of members in the pool misclassifies it. Previous authors have empirically evaluated this approach; nevertheless, they mostly assumed that label noise is generated completely at random in a dataset. This is a strong assumption since other types of label noise are feasible in practice and can influence noise detection results. This work investigates the performance of ensemble noise detection under two different noise models: the Noisy at Random (NAR), in which the probability of label noise depends on the instance class, in comparison to the Noisy Completely at Random model, in which the probability of label noise is entirely independent. In this setting, we investigate the effect of class distribution on noise detection performance since it changes the total noise level observed in a dataset under the NAR assumption. Further, an evaluation of the ensemble vote threshold is conducted to contrast with the most common approaches in the literature. In many performed experiments, choosing a noise generation model over another can lead to different results when considering aspects such as class imbalance and noise level ratio among different classes.

* Accepted for publication in IOS Press Intelligent Data Analysis. This paper will appear in Volume 26(5) of the IDA journal. The publication date for this issue is September 2022

Via

Access Paper or Ask Questions

Multi-label learning for dynamic model type recommendation

Apr 01, 2020

Mariana A. Souza, Robert Sabourin, George D. C. Cavalcanti, Rafael M. O. Cruz

Figure 1 for Multi-label learning for dynamic model type recommendation

Figure 2 for Multi-label learning for dynamic model type recommendation

Figure 3 for Multi-label learning for dynamic model type recommendation

Figure 4 for Multi-label learning for dynamic model type recommendation

Abstract:Dynamic selection techniques aim at selecting the local experts around each test sample in particular for performing its classification. While generating the classifier on a local scope may make it easier for singling out the locally competent ones, as in the online local pool (OLP) technique, using the same base-classifier model in uneven distributions may restrict the local level of competence, since each region may have a data distribution that favors one model over the others. Thus, we propose in this work a problem-independent dynamic base-classifier model recommendation for the OLP technique, which uses information regarding the behavior of a portfolio of models over the samples of different problems to recommend one (or several) of them on a per-instance manner. Our proposed framework builds a multi-label meta-classifier responsible for recommending a set of relevant model types based on the local data complexity of the region surrounding each test sample. The OLP technique then produces a local pool with the model that yields the highest probability score of the meta-classifier. Experimental results show that different data distributions favored different model types on a local scope. Moreover, based on the performance of an ideal model type selector, it was observed that there is a clear advantage in choosing a relevant model type for each test instance. Overall, the proposed model type recommender system yielded a statistically similar performance to the original OLP with fixed base-classifier model. Given the novelty of the approach and the gap in performance between the proposed framework and the ideal selector, we regard this as a promising research direction. Code available at github.com/marianaasouza/dynamic-model-recommender.

* Paper accepted to the 2020 International Joint Conference on Neural Networks

Via

Access Paper or Ask Questions

Evaluating Competence Measures for Dynamic Regressor Selection

Apr 09, 2019

Thiago J. M. Moura, George D. C. Cavalcanti, Luiz S. Oliveira

Figure 1 for Evaluating Competence Measures for Dynamic Regressor Selection

Figure 2 for Evaluating Competence Measures for Dynamic Regressor Selection

Figure 3 for Evaluating Competence Measures for Dynamic Regressor Selection

Figure 4 for Evaluating Competence Measures for Dynamic Regressor Selection

Abstract:Dynamic regressor selection (DRS) systems work by selecting the most competent regressors from an ensemble to estimate the target value of a given test pattern. This competence is usually quantified using the performance of the regressors in local regions of the feature space around the test pattern. However, choosing the best measure to calculate the level of competence correctly is not straightforward. The literature of dynamic classifier selection presents a wide variety of competence measures, which cannot be used or adapted for DRS. In this paper, we review eight measures used with regression problems, and adapt them to test the performance of the DRS algorithms found in the literature. Such measures are extracted from a local region of the feature space around the test pattern, called region of competence, therefore competence measures.To better compare the competence measures, we perform a set of comprehensive experiments of 15 regression datasets. Three DRS systems were compared against individual regressor and static systems that use the Mean and the Median to combine the outputs of the regressors from the ensemble. The DRS systems were assessed varying the competence measures. Our results show that DRS systems outperform individual regressors and static systems but the choice of the competence measure is problem-dependent.

Via

Access Paper or Ask Questions

ICPRAI 2018 SI: On dynamic ensemble selection and data preprocessing for multi-class imbalance learning

Nov 29, 2018

Rafael M. O. Cruz, Mariana A. Souza, Robert Sabourin, George D. C. Cavalcanti

Figure 1 for ICPRAI 2018 SI: On dynamic ensemble selection and data preprocessing for multi-class imbalance learning

Figure 2 for ICPRAI 2018 SI: On dynamic ensemble selection and data preprocessing for multi-class imbalance learning

Figure 3 for ICPRAI 2018 SI: On dynamic ensemble selection and data preprocessing for multi-class imbalance learning

Figure 4 for ICPRAI 2018 SI: On dynamic ensemble selection and data preprocessing for multi-class imbalance learning

Abstract:Class-imbalance refers to classification problems in which many more instances are available for certain classes than for others. Such imbalanced datasets require special attention because traditional classifiers generally favor the majority class which has a large number of instances. Ensemble of classifiers have been reported to yield promising results. However, the majority of ensemble methods applied to imbalanced learning are static ones. Moreover, they only deal with binary imbalanced problems. Hence, this paper presents an empirical analysis of dynamic selection techniques and data preprocessing methods for dealing with multi-class imbalanced problems. We considered five variations of preprocessing methods and fourteen dynamic selection schemes. Our experiments conducted on 26 multi-class imbalanced problems show that the dynamic ensemble improves the AUC and the G-mean as compared to the static ensemble. Moreover, data preprocessing plays an important role in such cases.

* Manuscript of the extended journal version of arXiv:1803.03877. This manuscript was accepted for publication in the IJPRAI as a Special Issue paper

Via

Access Paper or Ask Questions

Analyzing different prototype selection techniques for dynamic classifier and ensemble selection

Nov 01, 2018

Rafael M. O. Cruz, Robert Sabourin, George D. C. Cavalcanti

Figure 1 for Analyzing different prototype selection techniques for dynamic classifier and ensemble selection

Figure 2 for Analyzing different prototype selection techniques for dynamic classifier and ensemble selection

Figure 3 for Analyzing different prototype selection techniques for dynamic classifier and ensemble selection

Figure 4 for Analyzing different prototype selection techniques for dynamic classifier and ensemble selection

Abstract:In dynamic selection (DS) techniques, only the most competent classifiers, for the classification of a specific test sample are selected to predict the sample's class labels. The more important step in DES techniques is estimating the competence of the base classifiers for the classification of each specific test sample. The classifiers' competence is usually estimated using the neighborhood of the test sample defined on the validation samples, called the region of competence. Thus, the performance of DS techniques is sensitive to the distribution of the validation set. In this paper, we evaluate six prototype selection techniques that work by editing the validation data in order to remove noise and redundant instances. Experiments conducted using several state-of-the-art DS techniques over 30 classification problems demonstrate that by using prototype selection techniques we can improve the classification accuracy of DS techniques and also significantly reduce the computational cost involved.

* Published on the International Joint Conference on Neural Networks, 2017, 3959-3966

Via

Access Paper or Ask Questions