Skip to main navigation Skip to search Skip to main content

Comparing and Contrasting User and Runtime Directed Data Placement Strategies for Owner-Compute, Multi-accelerator Distributed Task Based Scheduling

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

Given GPU accelerators’ high arithmetic capacity, reducing data motion and optimizing locality are critical to achieving high performance. The task-based programming paradigm, as employed in the PaRSEC micro-task runtime system, enables the decoupling of data distribution and computation mapping to resources from the algorithm’s base expression. In this paper, we leverage this capability to explore the performance impact of several data placement strategies–some automatic and runtime-directed, and some user-directed–for the owner-compute scheduling model in the context of split-memory accelerators. We implement three different strategies for data and task mapping: a randomized first-touch policy that assigns data randomly to an accelerator, a load-balancing strategy that assigns data to the accelerator with the lowest load, and we compare it to a user-directed strategy that minimizes cross-accelerator traffic by placing tasks according to a cross-memory bandwidth minimizing strategy. We carry the evaluation on a variety of multi-GPU accelerated systems , including the Frontier system, and demonstrate that runtime-directed automatic data placement can improve locality compared to naive strategies, but also highlight that the capability of easily having modifiable user-directed data placement is of crucial importance to achieve peak performance.

Original languageEnglish
Title of host publicationAsynchronous Many-Task Systems and Applications - 3rd International Workshop, WAMTA 2025, Proceedings
EditorsPatrick Diehl, Qinglei Cao, Thomas Herault, George Bosilca
PublisherSpringer Science and Business Media Deutschland GmbH
Pages140-153
Number of pages14
ISBN (Print)9783031971952
DOIs
StatePublished - 2026
Event3rd International Workshop on Asynchronous Many-Task Systems and Applications, WAMTA 2025 - St. Louis, United States
Duration: Feb 19 2025Feb 21 2025

Publication series

NameLecture Notes in Computer Science
Volume15690 LNCS

Conference

Conference3rd International Workshop on Asynchronous Many-Task Systems and Applications, WAMTA 2025
Country/TerritoryUnited States
CitySt. Louis
Period02/19/2502/21/25

Keywords

  • Cholesky factorization
  • Matrix computations
  • Task-based runtime
  • accelerator

Fingerprint

Dive into the research topics of 'Comparing and Contrasting User and Runtime Directed Data Placement Strategies for Owner-Compute, Multi-accelerator Distributed Task Based Scheduling'. Together they form a unique fingerprint.

Cite this