Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Mark Blacher

Vectorized and performance-portable Quicksort

May 12, 2022

Mark Blacher, Joachim Giesen, Peter Sanders, Jan Wassenberg

Figure 1 for Vectorized and performance-portable Quicksort

Figure 2 for Vectorized and performance-portable Quicksort

Figure 3 for Vectorized and performance-portable Quicksort

Figure 4 for Vectorized and performance-portable Quicksort

Abstract:Recent works showed that implementations of Quicksort using vector CPU instructions can outperform the non-vectorized algorithms in widespread use. However, these implementations are typically single-threaded, implemented for a particular instruction set, and restricted to a small set of key types. We lift these three restrictions: our proposed 'vqsort' algorithm integrates into the state-of-the-art parallel sorter 'ips4o', with a geometric mean speedup of 1.59. The same implementation works on seven instruction sets (including SVE and RISC-V V) across four platforms. It also supports floating-point and 16-128 bit integer keys. To the best of our knowledge, this is the fastest sort for non-tuple keys on CPUs, up to 20 times as fast as the sorting algorithms implemented in standard libraries. This paper focuses on the practical engineering aspects enabling the speed and portability, which we have not yet seen demonstrated for a Quicksort implementation. Furthermore, we introduce compact and transpose-free sorting networks for in-register sorting of small arrays, and a vector-friendly pivot sampling strategy that is robust against adversarial input.

* 21 pages

Via

Access Paper or Ask Questions

Optimization for Classical Machine Learning Problems on the GPU

Mar 30, 2022

Sören Laue, Mark Blacher, Joachim Giesen

Figure 1 for Optimization for Classical Machine Learning Problems on the GPU

Figure 2 for Optimization for Classical Machine Learning Problems on the GPU

Figure 3 for Optimization for Classical Machine Learning Problems on the GPU

Figure 4 for Optimization for Classical Machine Learning Problems on the GPU

Abstract:Constrained optimization problems arise frequently in classical machine learning. There exist frameworks addressing constrained optimization, for instance, CVXPY and GENO. However, in contrast to deep learning frameworks, GPU support is limited. Here, we extend the GENO framework to also solve constrained optimization problems on the GPU. The framework allows the user to specify constrained optimization problems in an easy-to-read modeling language. A solver is then automatically generated from this specification. When run on the GPU, the solver outperforms state-of-the-art approaches like CVXPY combined with a GPU-accelerated solver such as cuOSQP or SCS by a few orders of magnitude.

* Appeared in AAAI 2022

Via

Access Paper or Ask Questions