Skip to main navigation Skip to search Skip to main content

Toward supporting multi-gpu targets via taskloop and user-defined schedules

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

Many modern supercomputers such as ORNL’s Summit, LLNL’s Sierra, and LBL’s upcoming Perlmutter offer or will offer multiple, e.g., 4 to 8, GPUs per node for running computational science and engineering applications. One should expect an application to achieve speedup using multiple GPUs on a node of a supercomputer over a single GPU of the node, in particular an application that is embarrassingly parallel and load imbalanced, such as AutoDock, QMCPACK and DMRG++. OpenMP is a popular model used to run applications on heterogeneous devices of a node and OpenMP 5.x provides rich features for tasking and GPU offloading. However, OpenMP doesn’t provide significant support for running application code on multiple GPUs efficiently, in particular for the aforementioned applications. We provide different OpenMP task-to-GPU scheduling strategies that help distribute an application’s work across GPUs on a node for efficient parallel GPU execution. Our solution involves using OpenMP’s construct taskloop to generate OpenMP tasks containing target regions for OpenMP threads, and then having OpenMP threads assign those tasks to GPUs on a node through a schedule specified by the application programmer. We analyze the performance of our solution using a small benchmark code representative of the aforementioned applications. Our solution improves performance over a standard baseline assignment of tasks to GPUs by up to 57.2%. Further, based on our results, we suggest OpenMP extensions that could help an application programmer have his or her application run on multiple GPUs per node efficiently.

Original languageEnglish
Title of host publicationOpenMP
Subtitle of host publicationPortable Multi-Level Parallelism on Modern Systems - 16th International Workshop on OpenMP, IWOMP 2020, Proceedings
EditorsKent Milfeld, Lars Koesterke, Bronis R. de Supinski, Jannis Klinkenberg
PublisherSpringer Science and Business Media Deutschland GmbH
Pages295-309
Number of pages15
ISBN (Print)9783030581435
DOIs
StatePublished - 2020
Event16th International Workshop on OpenMP, IWOMP 2020 - Austin, United States
Duration: Sep 22 2020Sep 24 2020

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume12295 LNCS

Conference

Conference16th International Workshop on OpenMP, IWOMP 2020
Country/TerritoryUnited States
CityAustin
Period09/22/2009/24/20

Keywords

  • AutoDock
  • High-performance
  • Load balancing
  • Multi GPUs
  • Offload
  • OpenMP
  • Parallel
  • Tasks
  • User-defined scheduling

Fingerprint

Dive into the research topics of 'Toward supporting multi-gpu targets via taskloop and user-defined schedules'. Together they form a unique fingerprint.

Cite this