Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mohammad Ali Khan

Concept-Monitor: Understanding DNN training through individual neurons

Apr 26, 2023

Mohammad Ali Khan, Tuomas Oikarinen, Tsui-Wei Weng

Figure 1 for Concept-Monitor: Understanding DNN training through individual neurons

Figure 2 for Concept-Monitor: Understanding DNN training through individual neurons

Figure 3 for Concept-Monitor: Understanding DNN training through individual neurons

Figure 4 for Concept-Monitor: Understanding DNN training through individual neurons

Abstract:In this work, we propose a general framework called Concept-Monitor to help demystify the black-box DNN training processes automatically using a novel unified embedding space and concept diversity metric. Concept-Monitor enables human-interpretable visualization and indicators of the DNN training processes and facilitates transparency as well as deeper understanding on how DNNs develop along the during training. Inspired by these findings, we also propose a new training regularizer that incentivizes hidden neurons to learn diverse concepts, which we show to improve training performance. Finally, we apply Concept-Monitor to conduct several case studies on different training paradigms including adversarial training, fine-tuning and network pruning via the Lottery Ticket Hypothesis

Via

Access Paper or Ask Questions

Compressing Large-Scale Transformer-Based Models: A Case Study on BERT

Feb 27, 2020

Prakhar Ganesh, Yao Chen, Xin Lou, Mohammad Ali Khan, Yin Yang, Deming Chen, Marianne Winslett, Hassan Sajjad, Preslav Nakov

Figure 1 for Compressing Large-Scale Transformer-Based Models: A Case Study on BERT

Figure 2 for Compressing Large-Scale Transformer-Based Models: A Case Study on BERT

Figure 3 for Compressing Large-Scale Transformer-Based Models: A Case Study on BERT

Figure 4 for Compressing Large-Scale Transformer-Based Models: A Case Study on BERT

Abstract:Transformer-based models pre-trained on large-scale corpora achieve state-of-the-art accuracy for natural language processing tasks, but are too resource-hungry and compute-intensive to suit low-capability devices or applications with strict latency requirements. One potential remedy is model compression, which has attracted extensive attention. This paper summarizes the branches of research on compressing Transformers, focusing on the especially popular BERT model. BERT's complex architecture means that a compression technique that is highly effective on one part of the model, e.g., attention layers, may be less successful on another part, e.g., fully connected layers. In this systematic study, we identify the state of the art in compression for each part of BERT, clarify current best practices for compressing large-scale Transformer models, and provide insights into the inner workings of various methods. Our categorization and analysis also shed light on promising future research directions for achieving a lightweight, accurate, and generic natural language processing model.

Via

Access Paper or Ask Questions