Commands
List of useful Slurm commands
-
Check Slurm cluster state:
sinfo
will display general cluster information.sinfo -s
will display the summary of cluster information.sinfo -N -l
will display the status of each nodescontrol show nodes
will display detailed information about the state of each node, helpful for debugging purposesscontrol show partitions
will display all available partitionsscontrol show partition <partition_name>
will display detailed partition information regarding the partition
-
Check Slurm user information:
sacctmgr show user
will display general user informationsacctmgr show association
will display user associations (to quotas, resource limitations, etc)
-
Check Slurm job states:
watch squeue -u $USER
will display information about jobs that are scheduled for execution or are currently runningsqueue -u $USER -o "%.18i %.20P %.15j %.8u %.8T %.10M %.20R"
will also display information about the scheduled jobs with more detailswatch sacct
will display the current state of each job (press Ctrl+C to exit)sacct -N slurm-worker-cpu-1
will show the list of executed jobs in which the given node was involvedsacct -u konrad --format=JobID,JobName,Partition,State,Elapsed -S now-1hour
will display recent job history per userscontrol show job <job_id>
will display a general information regarding the jobsstat <job_id>
will display a summary information regarding the jobscontrol getaddrs $(scontrol show job 10 | grep "NodeList=slurm" | cut -d '=' -f 2) | col2 | cut -d ':' -f 1
will display the worker node IP on which the current interactive job is running
-
Schedule Slurm jobs:
- use
sbatch test_job.batch
to schedule a job - to schedule a job against one particular partition, for example a GPU partition, use
sbatch test_job.batch -p gpu_large
command - use
scancel <job_id>
to cancel any jobs
- use
-
Check resource limitations:
- use the
quota -u $USER -s
command to check the storage space available sacctmgr show qos format=name%30,MaxJobsPerUser%30,MaxSubmitJobsPerUser%30,MaxTRESPerJob%30
will display detailed information regarding the partition-level resource limitations
- use the