Simple Linux Utility for Resource Management (or simply SLURM) is opensource software that performs job scheduling. It provides three key functions. First it allocates exclusive and/or non-exclusive access to resources (computer nodes) to users for some duration of time so they can perform work. Second, it provides a framework for starting, executing, and monitoring work (typically a parallel job) on a set of allocated nodes. Finally, it arbitrates contention for resources by managing a queue of pending work.
SLURM is the batch system of choice for the biggest computer in the world (Tianhe-I) and can sustain throughput of 120,000 jobs per hour. While complex configuration options are available, simple configurations can be established in a few minutes.
History
SLURM began development as a collaborative effort primarily by Lawrence Livermore National Laboratory, Linux NetworX, Hewlett-Packard, and Groupe Bull as an Open Source resource manager. It has since evolved into a sophisticated batch scheduler capable of satisfying the requirements of many large computer centers. SLURM is currently used on many of the largest computers in the world.
Structure
SLURM's design is very modular with dozens of optional plugins. In its simplest configuration, it can be installed and configured in a couple of minutes. More sophisticated configurations provide database integration for accounting plus management of resource limits and workload prioritization. SLURM also works with several meta-schedulers such as Moab Cluster Suite, Maui Cluster Scheduler, and Platform LSF.
License
SLURM is available under the GNU General Public License V2.
Commercial Support
In 2009, the developers of SLURM founded SchedMD, which provides development and training services.
References
- Balle, S. M. Balle and D. Palermo Enhancing an Open Source Resource Manager with Multi-Core/Multi-threaded Support, Job Scheduling Strategies for Parallel Processing, 2007.
- Jette, M. and M. Grondona, SLURM: Simple Linux Utility for Resource Management Proceedings of ClusterWorld Conference and Expo, San Jose, California, June 2003.
- Layton, Jeffrey B. Caos NSA and Perceus: All-in-one Cluster Software Stack Linux Magazine,5 February 2009.
- Yoo, A., M. Jette, and M. Grondona, SLURM: Simple Linux Utility for Resource Management, Job Scheduling Strategies for Parallel Processing, volume 2862 of Lecture Notes in Computer Science, pages 44–60, Springer-Verlag, 2003.