Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Gerhard Wellein

Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications

May 27, 2022

Ayesha Afzal, Georg Hager, Gerhard Wellein, Stefano Markidis

Figure 1 for Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications

Figure 2 for Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications

Figure 3 for Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications

Figure 4 for Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications

Abstract:This paper studies the utility of using data analytics and machine learning techniques for identifying, classifying, and characterizing the dynamics of large-scale parallel (MPI) programs. To this end, we run microbenchmarks and realistic proxy applications with the regular compute-communicate structure on two different supercomputing platforms and choose the per-process performance and MPI time per time step as relevant observables. Using principal component analysis, clustering techniques, correlation functions, and a new "phase space plot," we show how desynchronization patterns (or lack thereof) can be readily identified from a data set that is much smaller than a full MPI trace. Our methods also lead the way towards a more general classification of parallel program dynamics.

* 12 pages, 9 figures, 1 table

Via

Access Paper or Ask Questions

$K$-way $p$-spectral clustering on Grassmann manifolds

Aug 30, 2020

Dimosthenis Pasadakis, Christie Louis Alappat, Olaf Schenk, Gerhard Wellein

Figure 1 for $K$-way $p$-spectral clustering on Grassmann manifolds

Figure 2 for $K$-way $p$-spectral clustering on Grassmann manifolds

Figure 3 for $K$-way $p$-spectral clustering on Grassmann manifolds

Figure 4 for $K$-way $p$-spectral clustering on Grassmann manifolds

Abstract:Spectral methods have gained a lot of recent attention due to the simplicity of their implementation and their solid mathematical background. We revisit spectral graph clustering, and reformulate in the $p$-norm the continuous problem of minimizing the graph Laplacian Rayleigh quotient. The value of $p \in (1,2]$ is reduced, promoting sparser solution vectors that correspond to optimal clusters as $p$ approaches one. The computation of multiple $p$-eigenvectors of the graph $p$-Laplacian, a nonlinear generalization of the standard graph Laplacian, is achieved by the minimization of our objective function on the Grassmann manifold, hence ensuring the enforcement of the orthogonality constraint between them. Our approach attempts to bridge the fields of graph clustering and nonlinear numerical optimization, and employs a robust algorithm to obtain clusters of high quality. The benefits of the suggested method are demonstrated in a plethora of artificial and real-world graphs. Our results are compared against standard spectral clustering methods and the current state-of-the-art algorithm for clustering using the graph $p$-Laplacian variant.

* 36 pages, 12 figures, 4 Tables. Submitted to the "Journal of Machine Learning Research"

Via

Access Paper or Ask Questions

Performance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accelerator

Dec 17, 2013

Johannes Hofmann, Jan Treibig, Georg Hager, Gerhard Wellein

Figure 1 for Performance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accelerator

Figure 2 for Performance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accelerator

Figure 3 for Performance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accelerator

Figure 4 for Performance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accelerator

Abstract:We examine the Xeon Phi, which is based on Intel's Many Integrated Cores architecture, for its suitability to run the FDK algorithm--the most commonly used algorithm to perform the 3D image reconstruction in cone-beam computed tomography. We study the challenges of efficiently parallelizing the application and means to enable sensible data sharing between threads despite the lack of a shared last level cache. Apart from parallelization, SIMD vectorization is critical for good performance on the Xeon Phi; we perform various micro-benchmarks to investigate the platform's new set of vector instructions and put a special emphasis on the newly introduced vector gather capability. We refine a previous performance model for the application and adapt it for the Xeon Phi to validate the performance of our optimized hand-written assembly implementation, as well as the performance of several different auto-vectorization approaches.

Via

Access Paper or Ask Questions