Support Vector Machine (SVM) has been one of the most successful machine learning techniques for binary classification problems. The key idea is to maximize the margin from the data to the hyperplane subject to correct classification on training samples. The commonly used hinge loss and its variations are sensitive to label noise, and unstable for resampling due to its unboundedness. This paper is concentrated on the kernel SVM with the $\ell_0$-norm hinge loss (referred as $\ell_0$-KSVM), which is a composite function of hinge loss and $\ell_0$-norm and then could overcome the difficulties mentioned above. In consideration of the nonconvexity and nonsmoothness of $\ell_0$-norm hinge loss, we first characterize the limiting subdifferential of the $\ell_0$-norm hinge loss and then derive the equivalent relationship among the proximal stationary point, the Karush-Kuhn-Tucker point, and the local optimal solution of $\ell_0$-KSVM. Secondly, we develop an ADMM algorithm for $\ell_0$-KSVM, and obtain that any limit point of the sequence generated by the proposed algorithm is a locally optimal solution. Lastly, some experiments on the synthetic and real datasets are illuminated to show that $\ell_0$-KSVM can achieve comparable accuracy compared with the standard KSVM while the former generally enjoys fewer support vectors.