Skip to main navigation Skip to search Skip to main content

Understanding Recursive Divide-and-Conquer Dynamic Programs in Fork-Join and Data-Flow Execution Models

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

3 Scopus citations

Abstract

On shared-memory multicore machines, classic two-way recursive divide-and-conquer algorithms are implemented using common fork-join based parallel programming paradigms such as Intel Cilk+ or OpenMP. However, in such parallel paradigms, the use of joins for synchronization may lead to artificial dependencies among function calls which are not implied by the underlying DP recurrence. These artificial dependencies can increase the span asymptotically and thus reduce parallelism. From a practical perspective, they can lead to resource underutilization, i.e., threads becoming idle. To eliminate such artificial dependencies, task-based runtime systems and data-flow parallel paradigms, such as Concurrent Collections (CnC), PaRSEC, and Legion have been introduced. Such parallel paradigms and runtime systems overcome the limitations of fork-join parallelism by specifying data dependencies at a finer granularity and allowing tasks to execute as soon as dependencies are satisfied.In this paper, we investigate how the performance of data-flow implementations of recursive divide-and-conquer based DP algorithms compare with fork-join implementations. We have designed and implemented data-flow versions of DP algorithms in Intel CnC and compared the performance with fork-join based implementations in OpenMP. Considering different execution parameters (e.g., algorithmic properties such as recursive base size as well as machine configuration such as the number of physical cores, etc), our results confirm that a data-flow based implementation outperforms its fork-join based counter-part when due to artificial dependencies, the fork-join implementation fails to generate enough subtasks to keep all processors busy and does not have enough data locality to compensate for the lost performance. This phenomena happens when the input size of the DP algorithm is small or we have a huge number of compute cores in the system. As a result, with a fixed computation resource, moving from small input to larger input, fork-join implementation of DP algorithms outperforms the corresponding data-flow implementation. However, for a fixed size problem, moving the computation to a compute node with a larger number of cores, data-flow implementation outperforms the corresponding fork-join implementation.

Original languageEnglish
Title of host publication2021 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2021 - In conjunction with IEEE IPDPS 2021
PublisherInstitute of Electrical and Electronics Engineers Inc.
Pages407-416
Number of pages10
ISBN (Electronic)9781665435772
DOIs
StatePublished - Jun 2021
Event2021 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2021 - Virtual, Portland, United States
Duration: May 17 2021 → …

Publication series

Name2021 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2021 - In conjunction with IEEE IPDPS 2021

Conference

Conference2021 IEEE International Parallel and Distributed Processing Symposium Workshops, IPDPSW 2021
Country/TerritoryUnited States
CityVirtual, Portland
Period05/17/21 → …

Keywords

  • Concurrent Collections
  • Data-Flow Parallel Model
  • Dynamic Programming
  • Fork-Join Parallel Model
  • OpenMP
  • Recursive Divide-and-Conquer

Fingerprint

Dive into the research topics of 'Understanding Recursive Divide-and-Conquer Dynamic Programs in Fork-Join and Data-Flow Execution Models'. Together they form a unique fingerprint.

Cite this