Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Biniam Gebru

Mitigating shortage of labeled data using clustering-based active learning with diversity exploration

Jul 06, 2022

Xuyang Yan, Shabnam Nazmi, Biniam Gebru, Mohd Anwar, Abdollah Homaifar, Mrinmoy Sarkar, Kishor Datta Gupta

Figure 1 for Mitigating shortage of labeled data using clustering-based active learning with diversity exploration

Figure 2 for Mitigating shortage of labeled data using clustering-based active learning with diversity exploration

Figure 3 for Mitigating shortage of labeled data using clustering-based active learning with diversity exploration

Abstract:In this paper, we proposed a new clustering-based active learning framework, namely Active Learning using a Clustering-based Sampling (ALCS), to address the shortage of labeled data. ALCS employs a density-based clustering approach to explore the cluster structure from the data without requiring exhaustive parameter tuning. A bi-cluster boundary-based sample query procedure is introduced to improve the learning performance for classifying highly overlapped classes. Additionally, we developed an effective diversity exploration strategy to address the redundancy among queried samples. Our experimental results justified the efficacy of the ALCS approach.

* Accepted by the ICML 2022 Workshop on Adaptive Experimental Design and Active Learning in the Real World

Via

Access Paper or Ask Questions

A Supervised Feature Selection Method For Mixed-Type Data using Density-based Feature Clustering

Nov 10, 2021

Xuyang Yan, Mrinmoy Sarkar, Biniam Gebru, Shabnam Nazmi, Abdollah Homaifar

Figure 1 for A Supervised Feature Selection Method For Mixed-Type Data using Density-based Feature Clustering

Figure 2 for A Supervised Feature Selection Method For Mixed-Type Data using Density-based Feature Clustering

Figure 3 for A Supervised Feature Selection Method For Mixed-Type Data using Density-based Feature Clustering

Figure 4 for A Supervised Feature Selection Method For Mixed-Type Data using Density-based Feature Clustering

Abstract:Feature selection methods are widely used to address the high computational overheads and curse of dimensionality in classifying high-dimensional data. Most conventional feature selection methods focus on handling homogeneous features, while real-world datasets usually have a mixture of continuous and discrete features. Some recent mixed-type feature selection studies only select features with high relevance to class labels and ignore the redundancy among features. The determination of an appropriate feature subset is also a challenge. In this paper, a supervised feature selection method using density-based feature clustering (SFSDFC) is proposed to obtain an appropriate final feature subset for mixed-type data. SFSDFC decomposes the feature space into a set of disjoint feature clusters using a novel density-based clustering method. Then, an effective feature selection strategy is employed to obtain a subset of important features with minimal redundancy from those feature clusters. Extensive experiments as well as comparison studies with five state-of-the-art methods are conducted on SFSDFC using thirteen real-world benchmark datasets and results justify the efficacy of the SFSDFC method.

* 6 pages, 3 figures, 4 tables, accepted by the IEEE SMC 2021

Via

Access Paper or Ask Questions