Slurm Workload Manager

This is an old revision of this page, as edited by Dannyauble (talk | contribs) at 15:35, 16 February 2011 (History). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Simple Linux Utility for Resource Management (or simply SLURM) is the name of computer software that performs job scheduling. It provides three key functions. First it allocates exclusive and/or non-exclusive access to resources (computer nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (typically a parallel job) on a set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.

SLURM's design is very modular with dozens of optional plugins. In its simplest configuration, it can be installed and configured in a couple of minutes. More sophisticated configurations provide database integration for accounting plus management of resource limits and workload prioritization. SLURM also works with several meta-schedulers such as Moab Cluster Suite, Maui Cluster Scheduler, and Platform LSF.

SLURM is the batch system of choice for the biggest computer in the world (Tianhe-1A) and can sustain throughput of 120,000 jobs per hour. While complex configuration options are available, simple configurations can be established in a few minutes.

History

SLURM began development as a collaborative effort primarily by Lawrence Livermore National Laboratory, Linux NetworX, Hewlett-Packard, and Groupe Bull as an Open Source resource manager. It has since evolved into a sophisticated batch scheduler capable of satisfying the requirements of many large computer centers. SLURM is currently used on many of the largest computers in the world.

Commercial Support

License

SLURM is available under the GNU General Public License V2.

References

  • Balle, S. M. Balle and D. Palermo Enhancing an Open Source Resource Manager with Multi-Core/Multi-threaded Support, Job Scheduling Strategies for Parallel Processing, 2007.
  • Yoo, A., M. Jette, and M. Grondona, SLURM: Simple Linux Utility for Resource Management, Job Scheduling Strategies for Parallel Processing, volume 2862 of Lecture Notes in Computer Science, pages 44–60, Springer-Verlag, 2003.