Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Aravind Sukumaran-Rajam

Analytical Characterization and Design Space Exploration for Optimization of CNNs

Jan 24, 2021

Rui Li, Yufan Xu, Aravind Sukumaran-Rajam, Atanas Rountev, P. Sadayappan

Figure 1 for Analytical Characterization and Design Space Exploration for Optimization of CNNs

Figure 2 for Analytical Characterization and Design Space Exploration for Optimization of CNNs

Figure 3 for Analytical Characterization and Design Space Exploration for Optimization of CNNs

Figure 4 for Analytical Characterization and Design Space Exploration for Optimization of CNNs

Abstract:Moving data through the memory hierarchy is a fundamental bottleneck that can limit the performance of core algorithms of machine learning, such as convolutional neural networks (CNNs). Loop-level optimization, including loop tiling and loop permutation, are fundamental transformations to reduce data movement. However, the search space for finding the best loop-level optimization configuration is explosively large. This paper develops an analytical modeling approach for finding the best loop-level optimization configuration for CNNs on multi-core CPUs. Experimental evaluation shows that this approach achieves comparable or better performance than state-of-the-art libraries and auto-tuning based optimizers for CNNs.

Via

Access Paper or Ask Questions

PL-NMF: Parallel Locality-Optimized Non-negative Matrix Factorization

Apr 16, 2019

Gordon E. Moon, Aravind Sukumaran-Rajam, Srinivasan Parthasarathy, P. Sadayappan

Figure 1 for PL-NMF: Parallel Locality-Optimized Non-negative Matrix Factorization

Figure 2 for PL-NMF: Parallel Locality-Optimized Non-negative Matrix Factorization

Figure 3 for PL-NMF: Parallel Locality-Optimized Non-negative Matrix Factorization

Figure 4 for PL-NMF: Parallel Locality-Optimized Non-negative Matrix Factorization

Abstract:Non-negative Matrix Factorization (NMF) is a key kernel for unsupervised dimension reduction used in a wide range of applications, including topic modeling, recommender systems and bioinformatics. Due to the compute-intensive nature of applications that must perform repeated NMF, several parallel implementations have been developed in the past. However, existing parallel NMF algorithms have not addressed data locality optimizations, which are critical for high performance since data movement costs greatly exceed the cost of arithmetic/logic operations on current computer systems. In this paper, we devise a parallel NMF algorithm based on the HALS (Hierarchical Alternating Least Squares) scheme that incorporates algorithmic transformations to enhance data locality. Efficient realizations of the algorithm on multi-core CPUs and GPUs are developed, demonstrating significant performance improvement over existing state-of-the-art parallel NMF algorithms.

* 11 pages, 5 tables, 9 figures

Via

Access Paper or Ask Questions