Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Yixin Shi

Only Train Once: A One-Shot Neural Network Training And Pruning Framework

Jul 15, 2021

Tianyi Chen, Bo Ji, Tianyu Ding, Biyi Fang, Guanyi Wang, Zhihui Zhu, Luming Liang, Yixin Shi, Sheng Yi, Xiao Tu

Figure 1 for Only Train Once: A One-Shot Neural Network Training And Pruning Framework

Figure 2 for Only Train Once: A One-Shot Neural Network Training And Pruning Framework

Figure 3 for Only Train Once: A One-Shot Neural Network Training And Pruning Framework

Figure 4 for Only Train Once: A One-Shot Neural Network Training And Pruning Framework

Abstract:Structured pruning is a commonly used technique in deploying deep neural networks (DNNs) onto resource-constrained devices. However, the existing pruning methods are usually heuristic, task-specified, and require an extra fine-tuning procedure. To overcome these limitations, we propose a framework that compresses DNNs into slimmer architectures with competitive performances and significant FLOPs reductions by Only-Train-Once (OTO). OTO contains two keys: (i) we partition the parameters of DNNs into zero-invariant groups, enabling us to prune zero groups without affecting the output; and (ii) to promote zero groups, we then formulate a structured-sparsity optimization problem and propose a novel optimization algorithm, Half-Space Stochastic Projected Gradient (HSPG), to solve it, which outperforms the standard proximal methods on group sparsity exploration and maintains comparable convergence. To demonstrate the effectiveness of OTO, we train and compress full models simultaneously from scratch without fine-tuning for inference speedup and parameter reduction, and achieve state-of-the-art results on VGG16 for CIFAR10, ResNet50 for CIFAR10/ImageNet and Bert for SQuAD.

* Under Review

Via

Access Paper or Ask Questions

Neural Network Compression Via Sparse Optimization

Nov 11, 2020

Tianyi Chen, Bo Ji, Yixin Shi, Tianyu Ding, Biyi Fang, Sheng Yi, Xiao Tu

Figure 1 for Neural Network Compression Via Sparse Optimization

Figure 2 for Neural Network Compression Via Sparse Optimization

Abstract:The compression of deep neural networks (DNNs) to reduce inference cost becomes increasingly important to meet realistic deployment requirements of various applications. There have been a significant amount of work regarding network compression, while most of them are heuristic rule-based or typically not friendly to be incorporated into varying scenarios. On the other hand, sparse optimization yielding sparse solutions naturally fits the compression requirement, but due to the limited study of sparse optimization in stochastic learning, its extension and application onto model compression is rarely well explored. In this work, we propose a model compression framework based on the recent progress on sparse stochastic optimization. Compared to existing model compression techniques, our method is effective and requires fewer extra engineering efforts to incorporate with varying applications, and has been numerically demonstrated on benchmark compression tasks. Particularly, we achieve up to 7.2 and 2.9 times FLOPs reduction with the same level of evaluation accuracy on VGG16 for CIFAR10 and ResNet50 for ImageNet compared to the baseline heavy models, respectively.

Via

Access Paper or Ask Questions

Orthant Based Proximal Stochastic Gradient Method for $\ell_1$-Regularized Optimization

Apr 07, 2020

Tianyi Chen, Tianyu Ding, Bo Ji, Guanyi Wang, Yixin Shi, Sheng Yi, Xiao Tu, Zhihui Zhu

$Figure 1 for Orthant Based Proximal Stochastic Gradient Method for $\ell_1$-Regularized Optimization$

$Figure 2 for Orthant Based Proximal Stochastic Gradient Method for $\ell_1$-Regularized Optimization$

$Figure 3 for Orthant Based Proximal Stochastic Gradient Method for $\ell_1$-Regularized Optimization$

$Figure 4 for Orthant Based Proximal Stochastic Gradient Method for $\ell_1$-Regularized Optimization$

Abstract:Sparsity-inducing regularization problems are ubiquitous in machine learning applications, ranging from feature selection to model compression. In this paper, we present a novel stochastic method -- Orthant Based Proximal Stochastic Gradient Method (OBProx-SG) -- to solve perhaps the most popular instance, i.e., the l1-regularized problem. The OBProx-SG method contains two steps: (i) a proximal stochastic gradient step to predict a support cover of the solution; and (ii) an orthant step to aggressively enhance the sparsity level via orthant face projection. Compared to the state-of-the-art methods, e.g., Prox-SG, RDA and Prox-SVRG, the OBProx-SG not only converges to the global optimal solutions (in convex scenario) or the stationary points (in non-convex scenario), but also promotes the sparsity of the solutions substantially. Particularly, on a large number of convex problems, OBProx-SG outperforms the existing methods comprehensively in the aspect of sparsity exploration and objective values. Moreover, the experiments on non-convex deep neural networks, e.g., MobileNetV1 and ResNet18, further demonstrate its superiority by achieving the solutions of much higher sparsity without sacrificing generalization accuracy.

Via

Access Paper or Ask Questions

Detecting Relative Anomaly

May 16, 2016

Richard Neuberg, Yixin Shi

Abstract:System states that are anomalous from the perspective of a domain expert occur frequently in some anomaly detection problems. The performance of commonly used unsupervised anomaly detection methods may suffer in that setting, because they use frequency as a proxy for anomaly. We propose a novel concept for anomaly detection, called relative anomaly detection. It is tailored to be robust towards anomalies that occur frequently, by taking into account their location relative to the most typical observations. The approaches we develop are computationally feasible even for large data sets, and they allow real-time detection. We illustrate using data sets of potential scraping attempts and Wi-Fi channel utilization, both from Google, Inc.

Via

Access Paper or Ask Questions