Slurm Workload Manager

This is an old revision of this page, as edited by 71.146.147.215 (talk) at 05:04, 12 May 2011 (Supported platforms). The present address (URL) is a permanent link to this revision, which may differ significantly from the current revision.

Simple Linux Utility for Resource Management (or simply SLURM) is an opensource job scheduler used by many of the world's supercomputers and computer clusters. It provides three key functions. First it allocates exclusive and/or non-exclusive access to resources (computer nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (typically a parallel job such as MPI) on a set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending jobs.

SLURM is the batch system on many of the TOP500 supercomputers, including the fastest one in the world, China's Tianhe-I. SLURM is designed to handle thousands of nodes in a single cluster and can sustain throughput of 120,000 jobs per hour.

History

SLURM began development as a collaborative effort primarily by Lawrence Livermore National Laboratory, SchedMD, Linux NetworX, Hewlett-Packard, and Groupe Bull as an open Source resource manager. It has since evolved into a sophisticated batch scheduler capable of satisfying the requirements of many large computer centers. SLURM is currently used on many of the largest computers in the world.

Structure

SLURM's design is very modular with dozens of optional plugins. In its simplest configuration, it can be installed and configured in a couple of minutes. More sophisticated configurations provide database integration for accounting, management of resource limits and workload prioritization. SLURM also works with several meta-schedulers such as Moab Cluster Suite, Maui Cluster Scheduler, and Platform LSF.

Supported platforms

While SLURM was originally written for Linux, the latest version supports many other operating systems:[1]

SLURM also supports several unique computer architectures including:

License

SLURM is available under the GNU General Public License V2.

Commercial support

In 2009, the developers of SLURM founded SchedMD, which provides development and training services.

References

  • Balle, S. M. Balle and D. Palermo Enhancing an Open Source Resource Manager with Multi-Core/Multi-threaded Support, Job Scheduling Strategies for Parallel Processing, 2007.
  • Yoo, A., M. Jette, and M. Grondona, SLURM: Simple Linux Utility for Resource Management, Job Scheduling Strategies for Parallel Processing, volume 2862 of Lecture Notes in Computer Science, pages 44–60, Springer-Verlag, 2003.