While highly expressive parametric models including deep neural networks have an advantage to model complicated concepts, training such highly non-linear models is known to yield a high risk of notorious overfitting. To address this issue, this study considers a $(k,q)$th order variation regularization ($(k,q)$-VR), which is defined as the $q$th-powered integral of the absolute $k$th order derivative of the parametric models to be trained; penalizing the $(k,q)$-VR is expected to yield a smoother function, which is expected to avoid overfitting. Particularly, $(k,q)$-VR encompasses the conventional (general-order) total variation with $q=1$. While the $(k,q)$-VR terms applied to general parametric models are computationally intractable due to the integration, this study provides a stochastic optimization algorithm, that can efficiently train general models with the $(k,q)$-VR without conducting explicit numerical integration. The proposed approach can be applied to the training of even deep neural networks whose structure is arbitrary, as it can be implemented by only a simple stochastic gradient descent algorithm and automatic differentiation. Our numerical experiments demonstrate that the neural networks trained with the $(k,q)$-VR terms are more ``resilient'' than those with the conventional parameter regularization. The proposed algorithm also can be extended to the physics-informed training of neural networks (PINNs).