Abstract:Handling the ever-increasing scale of contemporary deep learning and transformer-based models poses a significant challenge. Although great strides have been made in optimizing model compression techniques such as model architecture search and knowledge distillation, the availability of data and computational resources remains a considerable hurdle for these optimizations. This paper introduces LayerCollapse, a novel alternative adaptive model compression methodology. LayerCollapse works by eliminating non-linearities within the network and collapsing two consecutive fully connected layers into a single linear transformation. This approach simultaneously reduces both the number of layers and the parameter count, thereby enhancing model efficiency. We also introduce a compression aware regularizer, which compresses the model in alignment with the dataset quality and model expressiveness, consequently reducing overfitting across tasks. Our results demonstrate LayerCollapse's effective compression and regularization capabilities in multiple fine-grained classification benchmarks, achieving up to 74% post training compression with minimal accuracy loss. We compare this method with knowledge distillation on the same target network, showcasing a five-fold increase in computational efficiency and 8% improvement in overall accuracy on the ImageNet dataset.
Abstract:Traditional machine learning training is a static process that lacks real-time adaptability of hyperparameters. Popular tuning solutions during runtime involve checkpoints and schedulers. Adjusting hyper-parameters usually require the program to be restarted, wasting utilization and time, while placing unnecessary strain on memory and processors. We present LiveTune, a new framework allowing real-time parameter tuning during training through LiveVariables. Live Variables allow for a continuous training session by storing parameters on designated ports on the system, allowing them to be dynamically adjusted. Extensive evaluations of our framework show saving up to 60 seconds and 5.4 Kilojoules of energy per hyperparameter change.