Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Dmitry Babichev

SIERRA, Inria, PSL

Efficient Primal-Dual Algorithms for Large-Scale Multiclass Classification

Feb 11, 2019

Dmitry Babichev, Dmitrii Ostrovskii, Francis Bach

Figure 1 for Efficient Primal-Dual Algorithms for Large-Scale Multiclass Classification

Figure 2 for Efficient Primal-Dual Algorithms for Large-Scale Multiclass Classification

Figure 3 for Efficient Primal-Dual Algorithms for Large-Scale Multiclass Classification

Abstract:We develop efficient algorithms to train $\ell_1$-regularized linear classifiers with large dimensionality $d$ of the feature space, number of classes $k$, and sample size $n$. Our focus is on a special class of losses that includes, in particular, the multiclass hinge and logistic losses. Our approach combines several ideas: (i) passing to the equivalent saddle-point problem with a quasi-bilinear objective; (ii) applying stochastic mirror descent with a proper choice of geometry which guarantees a favorable accuracy bound; (iii) devising non-uniform sampling schemes to approximate the matrix products. In particular, for the multiclass hinge loss we propose a \textit{sublinear} algorithm with iterations performed in $O(d+n+k)$ arithmetic operations.

Via

Access Paper or Ask Questions

Constant Step Size Stochastic Gradient Descent for Probabilistic Modeling

Apr 16, 2018

Dmitry Babichev, Francis Bach

Figure 1 for Constant Step Size Stochastic Gradient Descent for Probabilistic Modeling

Figure 2 for Constant Step Size Stochastic Gradient Descent for Probabilistic Modeling

Figure 3 for Constant Step Size Stochastic Gradient Descent for Probabilistic Modeling

Figure 4 for Constant Step Size Stochastic Gradient Descent for Probabilistic Modeling

Abstract:Stochastic gradient methods enable learning probabilistic models from large amounts of data. While large step-sizes (learning rates) have shown to be best for least-squares (e.g., Gaussian noise) once combined with parameter averaging, these are not leading to convergent algorithms in general. In this paper, we consider generalized linear models, that is, conditional models based on exponential families. We propose averaging moment parameters instead of natural parameters for constant-step-size stochastic gradient descent. For finite-dimensional models, we show that this can sometimes (and surprisingly) lead to better predictions than the best linear model. For infinite-dimensional models, we show that it always converges to optimal predictions, while averaging natural parameters never does. We illustrate our findings with simulations on synthetic data and classical benchmarks with many observations.

* 13 pages, 8 figures

Via

Access Paper or Ask Questions