Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Shen-Shyang Ho

Optimal N-ary ECOC Matrices for Ensemble Classification

Oct 05, 2021

Hieu D. Nguyen, Lucas J. Lavalva, Shen-Shyang Ho, Mohammed Sarosh Khan, Nicholas Kaegi

Figure 1 for Optimal N-ary ECOC Matrices for Ensemble Classification

Figure 2 for Optimal N-ary ECOC Matrices for Ensemble Classification

Figure 3 for Optimal N-ary ECOC Matrices for Ensemble Classification

Figure 4 for Optimal N-ary ECOC Matrices for Ensemble Classification

Abstract:A new recursive construction of $N$-ary error-correcting output code (ECOC) matrices for ensemble classification methods is presented, generalizing the classic doubling construction for binary Hadamard matrices. Given any prime integer $N$, this deterministic construction generates base-$N$ symmetric square matrices $M$ of prime-power dimension having optimal minimum Hamming distance between any two of its rows and columns. Experimental results for six datasets demonstrate that using these deterministic coding matrices for $N$-ary ECOC classification yields comparable and in many cases higher accuracy compared to using randomly generated coding matrices. This is particular true when $N$ is adaptively chosen so that the dimension of $M$ matches closely with the number of classes in a dataset, which reduces the loss in minimum Hamming distance when $M$ is truncated to fit the dataset. This is verified through a distance formula for $M$ which shows that these adaptive matrices have significantly higher minimum Hamming distance in comparison to randomly generated ones.

* 20 pages, 75 figures

Via

Access Paper or Ask Questions

Ensemble Learning using Error Correcting Output Codes: New Classification Error Bounds

Sep 18, 2021

Hieu D. Nguyen, Mohammed Sarosh Khan, Nicholas Kaegi, Shen-Shyang Ho, Jonathan Moore, Logan Borys, Lucas Lavalva

Figure 1 for Ensemble Learning using Error Correcting Output Codes: New Classification Error Bounds

Figure 2 for Ensemble Learning using Error Correcting Output Codes: New Classification Error Bounds

Figure 3 for Ensemble Learning using Error Correcting Output Codes: New Classification Error Bounds

Figure 4 for Ensemble Learning using Error Correcting Output Codes: New Classification Error Bounds

Abstract:New bounds on classification error rates for the error-correcting output code (ECOC) approach in machine learning are presented. These bounds have exponential decay complexity with respect to codeword length and theoretically validate the effectiveness of the ECOC approach. Bounds are derived for two different models: the first under the assumption that all base classifiers are independent and the second under the assumption that all base classifiers are mutually correlated up to first-order. Moreover, we perform ECOC classification on six datasets and compare their error rates with our bounds to experimentally validate our work and show the effect of correlation on classification accuracy.

* 14 pages, 11 figures

Via

Access Paper or Ask Questions

N-ary Error Correcting Coding Scheme

Mar 18, 2016

Joey Tianyi Zhou, Ivor W. Tsang, Shen-Shyang Ho, Klaus-Robert Muller

Figure 1 for N-ary Error Correcting Coding Scheme

Figure 2 for N-ary Error Correcting Coding Scheme

Figure 3 for N-ary Error Correcting Coding Scheme

Figure 4 for N-ary Error Correcting Coding Scheme

Abstract:The coding matrix design plays a fundamental role in the prediction performance of the error correcting output codes (ECOC)-based multi-class task. {In many-class classification problems, e.g., fine-grained categorization, it is difficult to distinguish subtle between-class differences under existing coding schemes due to a limited choices of coding values.} In this paper, we investigate whether one can relax existing binary and ternary code design to $N$-ary code design to achieve better classification performance. {In particular, we present a novel $N$-ary coding scheme that decomposes the original multi-class problem into simpler multi-class subproblems, which is similar to applying a divide-and-conquer method.} The two main advantages of such a coding scheme are as follows: (i) the ability to construct more discriminative codes and (ii) the flexibility for the user to select the best $N$ for ECOC-based classification. We show empirically that the optimal $N$ (based on classification performance) lies in $[3, 10]$ with some trade-off in computational cost. Moreover, we provide theoretical insights on the dependency of the generalization error bound of an $N$-ary ECOC on the average base classifier generalization error and the minimum distance between any two codes constructed. Extensive experimental results on benchmark multi-class datasets show that the proposed coding scheme achieves superior prediction performance over the state-of-the-art coding methods.

* Under submission to IEEE Transaction on Information Theory

Via

Access Paper or Ask Questions

On the Detection of Concept Changes in Time-Varying Data Stream by Testing Exchangeability

Jul 04, 2012

Shen-Shyang Ho, Harry Wechsler

Figure 1 for On the Detection of Concept Changes in Time-Varying Data Stream by Testing Exchangeability

Abstract:A martingale framework for concept change detection based on testing data exchangeability was recently proposed (Ho, 2005). In this paper, we describe the proposed change-detection test based on the Doob's Maximal Inequality and show that it is an approximation of the sequential probability ratio test (SPRT). The relationship between the threshold value used in the proposed test and its size and power is deduced from the approximation. The mean delay time before a change is detected is estimated using the average sample number of a SPRT. The performance of the test using various threshold values is examined on five different data stream scenarios simulated using two synthetic data sets. Finally, experimental results show that the test is effective in detecting changes in time-varying data streams simulated using three benchmark data sets.

* Appears in Proceedings of the Twenty-First Conference on Uncertainty in Artificial Intelligence (UAI2005)

Via

Access Paper or Ask Questions