Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Towards Understanding Label Smoothing

Jun 20, 2020

Yi Xu, Yuanhong Xu, Qi Qian, Hao Li, Rong Jin

Figure 1 for Towards Understanding Label Smoothing

Figure 2 for Towards Understanding Label Smoothing

Figure 3 for Towards Understanding Label Smoothing

Figure 4 for Towards Understanding Label Smoothing

Share this with someone who'll enjoy it:

Abstract:Label smoothing regularization (LSR) has a great success in training deep neural networks by stochastic algorithms such as stochastic gradient descent and its variants. However, the theoretical understanding of its power from the view of optimization is still rare. This study opens the door to a deep understanding of LSR by initiating the analysis. In this paper, we analyze the convergence behaviors of stochastic gradient descent with LSR for solving non-convex problems and show that an appropriate LSR can help to speed up the convergence by reducing the variance of labels. More interestingly, we proposed a simple and efficient strategy, namely Two-Stage LAbel smoothing algorithm (TSLA), that uses LSR in the early training epochs and drops it off in the later training epochs. We observe from the improved convergence result of TSLA that it benefits from LSR in the first stage and essentially converges faster in the second stage. To the best of our knowledge, this is the first work for understanding the power of LSR via establishing convergence complexity of stochastic methods with LSR in non-convex optimization. We empirically demonstrate the effectiveness of the proposed method in comparison with baselines on training ResNet models over public data sets.

View paper on

OpenReview

Share this with someone who'll enjoy it:

Title:Towards Understanding Label Smoothing

Paper and Code