Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Hal Finkel

Autotuning PolyBench Benchmarks with LLVM Clang/Polly Loop Optimization Pragmas Using Bayesian Optimization (extended version)

Apr 27, 2021

Xingfu Wu, Michael Kruse, Prasanna Balaprakash, Hal Finkel, Paul Hovland, Valerie Taylor, Mary Hall

Figure 1 for Autotuning PolyBench Benchmarks with LLVM Clang/Polly Loop Optimization Pragmas Using Bayesian Optimization (extended version)

Figure 2 for Autotuning PolyBench Benchmarks with LLVM Clang/Polly Loop Optimization Pragmas Using Bayesian Optimization (extended version)

Figure 3 for Autotuning PolyBench Benchmarks with LLVM Clang/Polly Loop Optimization Pragmas Using Bayesian Optimization (extended version)

Figure 4 for Autotuning PolyBench Benchmarks with LLVM Clang/Polly Loop Optimization Pragmas Using Bayesian Optimization (extended version)

Abstract:In this paper, we develop a ytopt autotuning framework that leverages Bayesian optimization to explore the parameter space search and compare four different supervised learning methods within Bayesian optimization and evaluate their effectiveness. We select six of the most complex PolyBench benchmarks and apply the newly developed LLVM Clang/Polly loop optimization pragmas to the benchmarks to optimize them. We then use the autotuning framework to optimize the pragma parameters to improve their performance. The experimental results show that our autotuning approach outperforms the other compiling methods to provide the smallest execution time for the benchmarks syr2k, 3mm, heat-3d, lu, and covariance with two large datasets in 200 code evaluations for effectively searching the parameter spaces with up to 170,368 different configurations. We find that the Floyd-Warshall benchmark did not benefit from autotuning because Polly uses heuristics to optimize the benchmark to make it run much slower. To cope with this issue, we provide some compiler option solutions to improve the performance. Then we present loop autotuning without a user's knowledge using a simple mctree autotuning framework to further improve the performance of the Floyd-Warshall benchmark. We also extend the ytopt autotuning framework to tune a deep learning application.

* Submitted to CCPE journal. arXiv admin note: substantial text overlap with arXiv:2010.08040

Via

Access Paper or Ask Questions

Report of the Workshop on Program Synthesis for Scientific Computing

Feb 02, 2021

Hal Finkel, Ignacio Laguna

Figure 1 for Report of the Workshop on Program Synthesis for Scientific Computing

Figure 2 for Report of the Workshop on Program Synthesis for Scientific Computing

Figure 3 for Report of the Workshop on Program Synthesis for Scientific Computing

Figure 4 for Report of the Workshop on Program Synthesis for Scientific Computing

Abstract:Program synthesis is an active research field in academia, national labs, and industry. Yet, work directly applicable to scientific computing, while having some impressive successes, has been limited. This report reviews the relevant areas of program synthesis work for scientific computing, discusses successes to date, and outlines opportunities for future work. This report is the result of the Workshop on Program Synthesis for Scientific Computing was held virtually on August 4-5 2020 (https://prog-synth-science.github.io/2020/).

* 29 pages, workshop website: https://prog-synth-science.github.io/2020/

Via

Access Paper or Ask Questions

Autotuning PolyBench Benchmarks with LLVM Clang/Polly Loop Optimization Pragmas Using Bayesian Optimization

Oct 15, 2020

Xingfu Wu, Michael Kruse, Prasanna Balaprakash, Hal Finkel, Paul Hovland, Valerie Taylor, Mary Hall

Figure 1 for Autotuning PolyBench Benchmarks with LLVM Clang/Polly Loop Optimization Pragmas Using Bayesian Optimization

Figure 2 for Autotuning PolyBench Benchmarks with LLVM Clang/Polly Loop Optimization Pragmas Using Bayesian Optimization

Figure 3 for Autotuning PolyBench Benchmarks with LLVM Clang/Polly Loop Optimization Pragmas Using Bayesian Optimization

Figure 4 for Autotuning PolyBench Benchmarks with LLVM Clang/Polly Loop Optimization Pragmas Using Bayesian Optimization

Abstract:An autotuning is an approach that explores a search space of possible implementations/configurations of a kernel or an application by selecting and evaluating a subset of implementations/configurations on a target platform and/or use models to identify a high performance implementation/configuration. In this paper, we develop an autotuning framework that leverages Bayesian optimization to explore the parameter space search. We select six of the most complex benchmarks from the application domains of the PolyBench benchmarks (syr2k, 3mm, heat-3d, lu, covariance, and Floyd-Warshall) and apply the newly developed LLVM Clang/Polly loop optimization pragmas to the benchmarks to optimize them. We then use the autotuning framework to optimize the pragma parameters to improve their performance. The experimental results show that our autotuning approach outperforms the other compiling methods to provide the smallest execution time for the benchmarks syr2k, 3mm, heat-3d, lu, and covariance with two large datasets in 200 code evaluations for effectively searching the parameter spaces with up to 170,368 different configurations. We compare four different supervised learning methods within Bayesian optimization and evaluate their effectiveness. We find that the Floyd-Warshall benchmark did not benefit from autotuning because Polly uses heuristics to optimize the benchmark to make it run much slower. To cope with this issue, we provide some compiler option solutions to improve the performance.

* to be published in the 11th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computer Systems (PMBS20)

Via

Access Paper or Ask Questions

Autotuning Search Space for Loop Transformations

Oct 13, 2020

Michael Kruse, Hal Finkel, Xingfu Wu

Figure 1 for Autotuning Search Space for Loop Transformations

Figure 2 for Autotuning Search Space for Loop Transformations

Figure 3 for Autotuning Search Space for Loop Transformations

Figure 4 for Autotuning Search Space for Loop Transformations

Abstract:One of the challenges for optimizing compilers is to predict whether applying an optimization will improve its execution speed. Programmers may override the compiler's profitability heuristic using optimization directives such as pragmas in the source code. Machine learning in the form of autotuning can assist users in finding the best optimizations for each platform. In this paper we propose a loop transformation search space that takes the form of a tree, in contrast to previous approaches that usually use vector spaces to represent loop optimization configurations. We implemented a simple autotuner exploring the search space and applied it to a selected set of PolyBench kernels. While the autotuner is capable of representing every possible sequence of loop transformations and their relations, the results motivate the use of better search strategies such as Monte Carlo tree search to find sophisticated loop transformations such as multilevel tiling.

* LLVM-in-HPC 2020 preprint

Via

Access Paper or Ask Questions

Design and Use of Loop-Transformation Pragmas

Oct 06, 2019

Michael Kruse, Hal Finkel

Figure 1 for Design and Use of Loop-Transformation Pragmas

Figure 2 for Design and Use of Loop-Transformation Pragmas

Figure 3 for Design and Use of Loop-Transformation Pragmas

Abstract:Adding a pragma directive into the source code is arguably easier than rewriting it, for instance for loop unrolling. Moreover, if the application is maintained for multiple platforms, their difference in performance characteristics may require different code transformations. Code transformation directives allow replacing the directives depending on the platform, i.e. separation of code semantics and its performance optimization. In this paper, we explore the design space (syntax and semantics) of adding such directive into a future OpenMP specification. Using a prototype implementation in Clang, we demonstrate the usefulness of such directives on a few benchmarks.

* IWOMP 2019, September 11-13, Auckland, preprint

Via

Access Paper or Ask Questions