Get our free extension to see links to code for papers anywhere online!Free add-on: code for papers everywhere!Free add-on: See code for papers anywhere!

Add to Chrome

Add to Firefox

Add to Edge

Georg Hager

Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications

May 27, 2022

Ayesha Afzal, Georg Hager, Gerhard Wellein, Stefano Markidis

Figure 1 for Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications

Figure 2 for Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications

Figure 3 for Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications

Figure 4 for Exploring Techniques for the Analysis of Spontaneous Asynchronicity in MPI-Parallel Applications

Abstract:This paper studies the utility of using data analytics and machine learning techniques for identifying, classifying, and characterizing the dynamics of large-scale parallel (MPI) programs. To this end, we run microbenchmarks and realistic proxy applications with the regular compute-communicate structure on two different supercomputing platforms and choose the per-process performance and MPI time per time step as relevant observables. Using principal component analysis, clustering techniques, correlation functions, and a new "phase space plot," we show how desynchronization patterns (or lack thereof) can be readily identified from a data set that is much smaller than a full MPI trace. Our methods also lead the way towards a more general classification of parallel program dynamics.

* 12 pages, 9 figures, 1 table

Via

Access Paper or Ask Questions

Performance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accelerator

Dec 17, 2013

Johannes Hofmann, Jan Treibig, Georg Hager, Gerhard Wellein

Figure 1 for Performance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accelerator

Figure 2 for Performance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accelerator

Figure 3 for Performance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accelerator

Figure 4 for Performance Engineering for a Medical Imaging Application on the Intel Xeon Phi Accelerator

Abstract:We examine the Xeon Phi, which is based on Intel's Many Integrated Cores architecture, for its suitability to run the FDK algorithm--the most commonly used algorithm to perform the 3D image reconstruction in cone-beam computed tomography. We study the challenges of efficiently parallelizing the application and means to enable sensible data sharing between threads despite the lack of a shared last level cache. Apart from parallelization, SIMD vectorization is critical for good performance on the Xeon Phi; we perform various micro-benchmarks to investigate the platform's new set of vector instructions and put a special emphasis on the newly introduced vector gather capability. We refine a previous performance model for the application and adapt it for the Xeon Phi to validate the performance of our optimized hand-written assembly implementation, as well as the performance of several different auto-vectorization approaches.

Via

Access Paper or Ask Questions