Abstract:Kernel-based regularized risk minimizers, also called support vector machines (SVMs), are known to possess many desirable properties but suffer from their super-linear computational requirements when dealing with large data sets. This problem can be tackled by using localized SVMs instead, which also offer the additional advantage of being able to apply different hyperparameters to different regions of the input space. In this paper, localized SVMs are analyzed with regards to their consistency. It is proven that they inherit $L_p$- as well as risk consistency from global SVMs under very weak conditions and even if the regions underlying the localized SVMs are allowed to change as the size of the training data set increases.
Abstract:As a predictor's quality is often assessed by means of its risk, it is natural to regard risk consistency as a desirable property of learning methods, and many such methods have indeed been shown to be risk consistent. The first aim of this paper is to establish the close connection between risk consistency and $L_p$-consistency for a considerably wider class of loss functions than has been done before. The attempt to transfer this connection to shifted loss functions surprisingly reveals that this shift does not reduce the assumptions needed on the underlying probability measure to the same extent as it does for many other results. The results are applied to regularized kernel methods such as support vector machines.
Abstract:Regularized kernel-based methods such as support vector machines (SVMs) typically depend on the underlying probability measure $\mathrm{P}$ (respectively an empirical measure $\mathrm{D}_n$ in applications) as well as on the regularization parameter $\lambda$ and the kernel $k$. Whereas classical statistical robustness only considers the effect of small perturbations in $\mathrm{P}$, the present paper investigates the influence of simultaneous slight variations in the whole triple $(\mathrm{P},\lambda,k)$, respectively $(\mathrm{D}_n,\lambda_n,k)$, on the resulting predictor. Existing results from the literature are considerably generalized and improved. In order to also make them applicable to big data, where regular SVMs suffer from their super-linear computational requirements, we show how our results can be transferred to the context of localized learning. Here, the effect of slight variations in the applied regionalization, which might for example stem from changes in $\mathrm{P}$ respectively $\mathrm{D}_n$, is considered as well.