Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Pavel Yakovlev

RISC-V RVV efficiency for ANN algorithms

Jul 18, 2024

Konstantin Rumyantsev, Pavel Yakovlev, Andrey Gorshkov, Andrey P. Sokolov

Figure 1 for RISC-V RVV efficiency for ANN algorithms

Figure 2 for RISC-V RVV efficiency for ANN algorithms

Figure 3 for RISC-V RVV efficiency for ANN algorithms

Figure 4 for RISC-V RVV efficiency for ANN algorithms

Abstract:Handling vast amounts of data is crucial in today's world. The growth of high-performance computing has created a need for parallelization, particularly in the area of machine learning algorithms such as ANN (Approximate Nearest Neighbors). To improve the speed of these algorithms, it is important to optimize them for specific processor architectures. RISC-V (Reduced Instruction Set Computer Five) is one of the modern processor architectures, which features a vector instruction set called RVV (RISC-V Vector Extension). In machine learning algorithms, vector extensions are widely utilized to improve the processing of voluminous data. This study examines the effectiveness of applying RVV to commonly used ANN algorithms. The algorithms were adapted for RISC-V and optimized using RVV after identifying the primary bottlenecks. Additionally, we developed a theoretical model of a parameterized vector block and identified the best on average configuration that demonstrates the highest theoretical performance of the studied ANN algorithms when the other CPU parameters are fixed.

Via

Access Paper or Ask Questions

Accelerating Barnes-Hut t-SNE Algorithm by Efficient Parallelization on Multi-Core CPUs

Dec 22, 2022

Narendra Chaudhary, Alexander Pivovar, Pavel Yakovlev, Andrey Gorshkov, Sanchit Misra

Figure 1 for Accelerating Barnes-Hut t-SNE Algorithm by Efficient Parallelization on Multi-Core CPUs

Figure 2 for Accelerating Barnes-Hut t-SNE Algorithm by Efficient Parallelization on Multi-Core CPUs

Figure 3 for Accelerating Barnes-Hut t-SNE Algorithm by Efficient Parallelization on Multi-Core CPUs

Figure 4 for Accelerating Barnes-Hut t-SNE Algorithm by Efficient Parallelization on Multi-Core CPUs

Abstract:t-SNE remains one of the most popular embedding techniques for visualizing high-dimensional data. Most standard packages of t-SNE, such as scikit-learn, use the Barnes-Hut t-SNE (BH t-SNE) algorithm for large datasets. However, existing CPU implementations of this algorithm are inefficient. In this work, we accelerate the BH t-SNE on CPUs via cache optimizations, SIMD, parallelizing sequential steps, and improving parallelization of multithreaded steps. Our implementation (Acc-t-SNE) is up to 261x and 4x faster than scikit-learn and the state-of-the-art BH t-SNE implementation from daal4py, respectively, on a 32-core Intel(R) Icelake cloud instance.

Via

Access Paper or Ask Questions