Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Ronnie C. O. Alves

Explanation-by-Example Based on Item Response Theory

Oct 04, 2022

Lucas F. F. Cardoso, José de S. Ribeiro, Vitor C. A. Santos, Raíssa L. Silva, Marcelle P. Mota, Ricardo B. C. Prudêncio, Ronnie C. O. Alves

Figure 1 for Explanation-by-Example Based on Item Response Theory

Figure 2 for Explanation-by-Example Based on Item Response Theory

Figure 3 for Explanation-by-Example Based on Item Response Theory

Figure 4 for Explanation-by-Example Based on Item Response Theory

Abstract:Intelligent systems that use Machine Learning classification algorithms are increasingly common in everyday society. However, many systems use black-box models that do not have characteristics that allow for self-explanation of their predictions. This situation leads researchers in the field and society to the following question: How can I trust the prediction of a model I cannot understand? In this sense, XAI emerges as a field of AI that aims to create techniques capable of explaining the decisions of the classifier to the end-user. As a result, several techniques have emerged, such as Explanation-by-Example, which has a few initiatives consolidated by the community currently working with XAI. This research explores the Item Response Theory (IRT) as a tool to explaining the models and measuring the level of reliability of the Explanation-by-Example approach. To this end, four datasets with different levels of complexity were used, and the Random Forest model was used as a hypothesis test. From the test set, 83.8% of the errors are from instances in which the IRT points out the model as unreliable.

* 15 pages, 5 figures, 3 tables, submitted for the BRACIS'22 conference

Via

Access Paper or Ask Questions

Data vs classifiers, who wins?

Jul 21, 2021

Lucas F. F. Cardoso, Vitor C. A. Santos, Regiane S. Kawasaki Francês, Ricardo B. C. Prudêncio, Ronnie C. O. Alves

Figure 1 for Data vs classifiers, who wins?

Figure 2 for Data vs classifiers, who wins?

Figure 3 for Data vs classifiers, who wins?

Figure 4 for Data vs classifiers, who wins?

Abstract:The classification experiments covered by machine learning (ML) are composed by two important parts: the data and the algorithm. As they are a fundamental part of the problem, both must be considered when evaluating a model's performance against a benchmark. The best classifiers need robust benchmarks to be properly evaluated. For this, gold standard benchmarks such as OpenML-CC18 are used. However, data complexity is commonly not considered along with the model during a performance evaluation. Recent studies employ Item Response Theory (IRT) as a new approach to evaluating datasets and algorithms, capable of evaluating both simultaneously. This work presents a new evaluation methodology based on IRT and Glicko-2, jointly with the decodIRT tool developed to guide the estimation of IRT in ML. It explores the IRT as a tool to evaluate the OpenML-CC18 benchmark for its algorithmic evaluation capability and checks if there is a subset of datasets more efficient than the original benchmark. Several classifiers, from classics to ensemble, are also evaluated using the IRT models. The Glicko-2 rating system was applied together with IRT to summarize the innate ability and classifiers performance. It was noted that not all OpenML-CC18 datasets are really useful for evaluating algorithms, where only 10% were rated as being really difficult. Furthermore, it was verified the existence of a more efficient subset containing only 50% of the original size. While Randon Forest was singled out as the algorithm with the best innate ability.

* 15 pages, 6 figures and 9 tables

Via

Access Paper or Ask Questions

NASirt: AutoML based learning with instance-level complexity information

Aug 26, 2020

Habib Asseiss Neto, Ronnie C. O. Alves, Sergio V. A. Campos

Figure 1 for NASirt: AutoML based learning with instance-level complexity information

Figure 2 for NASirt: AutoML based learning with instance-level complexity information

Figure 3 for NASirt: AutoML based learning with instance-level complexity information

Figure 4 for NASirt: AutoML based learning with instance-level complexity information

Abstract:Designing adequate and precise neural architectures is a challenging task, often done by highly specialized personnel. AutoML is a machine learning field that aims to generate good performing models in an automated way. Spectral data such as those obtained from biological analysis have generally a lot of important information, and these data are specifically well suited to Convolutional Neural Networks (CNN) due to their image-like shape. In this work we present NASirt, an AutoML methodology based on Neural Architecture Search (NAS) that finds high accuracy CNN architectures for spectral datasets. The proposed methodology relies on the Item Response Theory (IRT) for obtaining characteristics from an instance level, such as discrimination and difficulty, and it is able to define a rank of top performing submodels. Several experiments are performed in order to demonstrate the methodology's performance with different spectral datasets. Accuracy results are compared to other benchmarks methods, such as a high performing, manually crafted CNN and the Auto-Keras AutoML tool. The results show that our method performs, in most cases, better than the benchmarks, achieving average accuracy as high as 96.96%.

* to be published

Via

Access Paper or Ask Questions

Decoding machine learning benchmarks

Aug 19, 2020

Lucas F. F. Cardoso, Vitor C. A. Santos, Regiane S. K. Francês, Ricardo B. C. Prudêncio, Ronnie C. O. Alves

Figure 1 for Decoding machine learning benchmarks

Figure 2 for Decoding machine learning benchmarks

Figure 3 for Decoding machine learning benchmarks

Figure 4 for Decoding machine learning benchmarks

Abstract:Despite the availability of benchmark machine learning (ML) repositories (e.g., UCI, OpenML), there is no standard evaluation strategy yet capable of pointing out which is the best set of datasets to serve as gold standard to test different ML algorithms. In recent studies, Item Response Theory (IRT) has emerged as a new approach to elucidate what should be a good ML benchmark. This work applied IRT to explore the well-known OpenML-CC18 benchmark to identify how suitable it is on the evaluation of classifiers. Several classifiers ranging from classical to ensembles ones were evaluated using IRT models, which could simultaneously estimate dataset difficulty and classifiers' ability. The Glicko-2 rating system was applied on the top of IRT to summarize the innate ability and aptitude of classifiers. It was observed that not all datasets from OpenML-CC18 are really useful to evaluate classifiers. Most datasets evaluated in this work (84%) contain easy instances in general (e.g., around 10% of difficult instances only). Also, 80% of the instances in half of this benchmark are very discriminating ones, which can be of great use for pairwise algorithm comparison, but not useful to push classifiers abilities. This paper presents this new evaluation methodology based on IRT as well as the tool decodIRT, developed to guide IRT estimation over ML benchmarks.

* Paper published at the BRACIS 2020 conference, 15 pages, 4 figures

Via

Access Paper or Ask Questions