Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Title:CLBlast: A Tuned OpenCL BLAS Library

Apr 27, 2018

Cedric Nugteren

Figure 1 for CLBlast: A Tuned OpenCL BLAS Library

Figure 2 for CLBlast: A Tuned OpenCL BLAS Library

Figure 3 for CLBlast: A Tuned OpenCL BLAS Library

Figure 4 for CLBlast: A Tuned OpenCL BLAS Library

Share this with someone who'll enjoy it:

Abstract:This work introduces CLBlast, an open-source BLAS library providing optimized OpenCL routines to accelerate dense linear algebra for a wide variety of devices. It is targeted at machine learning and HPC applications and thus provides a fast matrix-multiplication routine (GEMM) to accelerate the core of many applications (e.g. deep learning, iterative solvers, astrophysics, computational fluid dynamics, quantum chemistry). CLBlast has five main advantages over other OpenCL BLAS libraries: 1) it is optimized for and tested on a large variety of OpenCL devices including less commonly used devices such as embedded and low-power GPUs, 2) it can be explicitly tuned for specific problem-sizes on specific hardware platforms, 3) it can perform operations in half-precision floating-point FP16 saving bandwidth, time and energy, 4) it has an optional CUDA back-end, 5) and it can combine multiple operations in a single batched routine, accelerating smaller problems significantly. This paper describes the library and demonstrates the advantages of CLBlast experimentally for different use-cases on a wide variety of OpenCL hardware.

* Conference paper in: IWOCL '18, the International Workshop on OpenCL

View paper on

Share this with someone who'll enjoy it:

Title:CLBlast: A Tuned OpenCL BLAS Library

Paper and Code