Abstract:Attention head pruning has emerged as an effective technique for transformer model compression, an increasingly important goal in the era of Green AI. However, existing pruning methods often rely on static importance scores, which fail to capture the evolving role of attention heads during iterative removal. We propose Greedy-Gradient norm (Greedy-Gnorm), a novel head pruning algorithm that dynamically recalculates head importance after each pruning step. Specifically, each head is scored by the elementwise product of the l2-norms of its Q/K/V gradient blocks, as estimated from a hold-out validation set and updated at every greedy iteration. This dynamic approach to scoring mitigates against stale rankings and better reflects gradient-informed importance as pruning progresses. Extensive experiments on BERT, ALBERT, RoBERTa, and XLM-RoBERTa demonstrate that Greedy-Gnorm consistently preserves accuracy under substantial head removal, outperforming attention entropy. By effectively reducing model size while maintaining task performance, Greedy-Gnorm offers a promising step toward more energy-efficient transformer model deployment.




Abstract:Reducing the lateral scale of two-dimensional (2D) materials to one-dimensional (1D) has attracted substantial research interest not only to achieve competitive electronic device applications but also for the exploration of fundamental physical properties. Controllable synthesis of high-quality 1D nanoribbons (NRs) is thus highly desirable and essential for the further study. Traditional exploration of the optimal synthesis conditions of novel materials is based on the trial-and-error approach, which is time consuming, costly and laborious. Recently, machine learning (ML) has demonstrated promising capability in guiding material synthesis through effectively learning from the past data and then making recommendations. Here, we report the implementation of supervised ML for the chemical vapor deposition (CVD) synthesis of high-quality 1D few-layered WTe2 nanoribbons (NRs). The synthesis parameters of the WTe2 NRs are optimized by the trained ML model. On top of that, the growth mechanism of as-synthesized 1T' few-layered WTe2 NRs is further proposed, which may inspire the growth strategies for other 1D nanostructures. Our findings suggest that ML is a powerful and efficient approach to aid the synthesis of 1D nanostructures, opening up new opportunities for intelligent material development.