Skip to main navigation Skip to search Skip to main content

A slurm simulator: Implementation and parametric analysis

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

48 Scopus citations

Abstract

Slurm is an open-source resource manager for HPC that provides high configurability for inhomogeneous resources and job scheduling. Various Slurm parametric settings can significantly influence HPC resource utilization and job wait time, however in many cases it is hard to judge how these options will affect the overall HPC resource performance. The Slurm simulator can be a very helpful tool to aid parameter selection for a particular HPC resource. Here, we report our implementation of a Slurm simulator and the impact of parameter choice on HPC resource performance. The simulator is based on a real Slurm instance with modifications to allow simulation of historical jobs and to improve the simulation speed. The simulator speed heavily depends on job composition, HPC resource size and Slurm configuration. For an 8000 cores heterogeneous cluster, we achieve about 100 times acceleration, e.g. 20 days can be simulated in 5h. Several parameters affecting job placement were studied. Disabling node sharing on our 8000 core cluster showed a 45% increase in the time needed to complete the same workload. For a large system (>6000 nodes) comprised of two distinct sub-clusters, two separate Slurm controllers and adding node sharing can cut waiting times nearly in half.

Original languageEnglish
Title of host publicationHigh Performance Computing Systems. Performance Modeling, Benchmarking, and Simulation - 8th International Workshop, Proceedings
EditorsSimon Hammond, Stephen Jarvis, Steven Wright
PublisherSpringer Verlag
Pages197-217
Number of pages21
ISBN (Print)9783319729701
DOIs
StatePublished - 2018
Event8th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems, PMBS 2017 - [state] CO, United States
Duration: Nov 13 2017Nov 13 2017

Publication series

NameLecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics)
Volume10724 LNCS

Conference

Conference8th International Workshop on Performance Modeling, Benchmarking and Simulation of High Performance Computing Systems, PMBS 2017
Country/TerritoryUnited States
City[state] CO
Period11/13/1711/13/17

Keywords

  • Batch jobs scheduler
  • HPC
  • SLURM
  • Simulator

Fingerprint

Dive into the research topics of 'A slurm simulator: Implementation and parametric analysis'. Together they form a unique fingerprint.

Cite this