FAQ

Q2.12. I want to perform a large number of tasks while keeping the number of nodes in use low with bulk jobs.

A.2.12.

If you would like to execute a large number of tasks while keeping the number of nodes in use low, refer to the following example script to control the number of sruns to be executed simultaneously.

In this sample, one node is used and each calculation for one CPU core is repeated 3000 times.


#SBATCH -p XXX #Specify partition
#SBATCH --nodes=1
#SBATCH --ntasks=128
#SBATCH --cpus-per-task=1
 
TOTAL_TASKS=3000 # Total number of tasks (processes) to be executed repeatedly
EXE="myapp -a -b -c" # Commands to run the application
SRUN_OPTS="-N 1 -n 1 -c 1 --mem-per-cpu=1840 --exclusive"
 
CONCURRENT_TASKS=$SLURM_NTASKS # Number of concurrent executions
WAIT_TIME=10 # srun Interval to check for termination (seconds)
 
count=0
task_id=()
while [ $TOTAL_TASKS -gt $count ]; do
        for ((i=0;i<$CONCURRENT_TASKS;i++)); do
                pid=${task_id[i]:=dammy}
                if [ ! -d /proc/$pid ]; then
                        # Run srun to save the PID
                        srun $SRUN_OPTS $EXE & task_id[$i]=$!
                        count=$((++count))
                        [ $TOTAL_TASKS -eq $count ] && break
                fi
        done
 
        if ps | grep -q srun; then
                sleep $WAIT_TIME
        fi
done
 
wait # Wait for all tasks to be completed