Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:Revisit Micro-batch Clipping: Adaptive Data Pruning via Gradient Manipulation

Aug 29, 2024

Lun Wang

Figure 1 for Revisit Micro-batch Clipping: Adaptive Data Pruning via Gradient Manipulation

Figure 2 for Revisit Micro-batch Clipping: Adaptive Data Pruning via Gradient Manipulation

Figure 3 for Revisit Micro-batch Clipping: Adaptive Data Pruning via Gradient Manipulation

Figure 4 for Revisit Micro-batch Clipping: Adaptive Data Pruning via Gradient Manipulation

Share this with someone who'll enjoy it:

Abstract:Micro-batch clipping, a gradient clipping method, has recently shown potential in enhancing auto-speech recognition (ASR) model performance. However, the underlying mechanism behind this improvement remains mysterious, particularly the observation that only certain micro-batch sizes are beneficial. In this paper, we make the first attempt to explain this phenomenon. Inspired by recent data pruning research, we assume that specific training samples may impede model convergence during certain training phases. Under this assumption, the convergence analysis shows that micro-batch clipping can improve the convergence rate asymptotically at the cost of an additional constant bias that does not diminish with more training iterations. The bias is dependent on a few factors and can be minimized at specific micro-batch size, thereby elucidating the existence of the sweet-spot micro-batch size observed previously. We also verify the effectiveness of micro-batch clipping beyond speech models on vision and language models, and show promising performance gains in these domains. An exploration of potential limitations shows that micro-batch clipping is less effective when training data originates from multiple distinct domains.

View paper on

Share this with someone who'll enjoy it:

Title:Revisit Micro-batch Clipping: Adaptive Data Pruning via Gradient Manipulation

Paper and Code