Skip to main navigation Skip to search Skip to main content

An analysis of node sharing on HPC clusters using XDMoD/TACC-stats

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

9 Scopus citations

Abstract

When a user requests less than a full node for a job on XSEDE's large resources-Stampede and Lonestar4-, that is less than 16 cores on Stampede or 12 cores on Lonestar4, they are assigned a full node by policy. Although the actual CPU hours consumed by these jobs is small when compared to the total CPU hours delivered by these resources, they do represent a substantial fraction of the total number of jobs (~18% for Stampede and ~15% for Lonestar4 between January and February 2014). Academic HPC centers, such as the Center for Computational Research (CCR) at the University at Buffalo, SUNY typically have a much larger proportion of small jobs than the large XSEDE systems. For CCR's production cluster, Rush, the decision was made to allow the allocation of simultaneous jobs on the same node. This greatly increases the overall throughput but also raises questions whether the jobs that share the same node will interfere with one another. We present here an analysis that explores this issue using data from Rush, Stampede and Lonestar4. Analysis of usage data indicates little interference.

Original languageEnglish
Title of host publicationProceedings of the XSEDE 2014 Conference
Subtitle of host publicationEngaging Communities
PublisherAssociation for Computing Machinery
ISBN (Print)9781450328937
DOIs
StatePublished - 2014
Event2014 Annual Conference on Extreme Science and Engineering Discovery Environment, XSEDE 2014 - Atlanta, GA, United States
Duration: Jul 13 2014Jul 18 2014

Publication series

NameACM International Conference Proceeding Series

Conference

Conference2014 Annual Conference on Extreme Science and Engineering Discovery Environment, XSEDE 2014
Country/TerritoryUnited States
CityAtlanta, GA
Period07/13/1407/18/14

Keywords

  • HPC
  • Node sharing
  • SUPReMM
  • TACC-Stats
  • XDMoD

Fingerprint

Dive into the research topics of 'An analysis of node sharing on HPC clusters using XDMoD/TACC-stats'. Together they form a unique fingerprint.

Cite this