FAQ

1. Public offering

A1.01. Please consider submitting a Class A application. Applications for this class are accepted at any time. Please click here for details.

2. System B ohtaka

A2.03. For example, if you execute squeue -o “%i %P %T %S  %Z”

JOBID  PARTITION  STATE    START_TIME      WORK_DIR 

is shown, allowing you to confirm the details of your job. The “START_TIME” column indicates the estimated execution time, and the “WORK_DIR” column shows the directory path. Entering a number after “%” (e.g., “%10P”) produces the corresponding number of characters as the output. You can obtain details of the squeue command options by running man squeue.

A2.1. You can find out how many jobs are running or waiting for execution on each partition of System B ohtaka by executing the pstat command.

If you wish to have the contents of your home directory referenced by your own group, you can use a feature called ACL.

 

For example, if you want the group k9999 to refer to your own k999901 home directory, the k9999 group will be accessible by executing the following command.

cd /home/k9999
setfacl -m group:k9999:r-x ./k999901

 

It is also possible to specify a user as follows.

To make k999902 refer to its own k999901 home directory, execute the following command.

setfacl -m user: k999902:r-x ./k999901

To check the configured ACLs, use the following command.

getfacl ./k999901

A2.11.

This may be due to a change in the shared memory mechanism used for intra-node communication in Intel oneAPI MPI in the Intel compiler version upgrade on 2023/04.

Before executing the program in the job script, specify psm3 in the following environment variable to change the shared memory mechanism and see if the situation improves.

  (environment variable)

   export FI_PROVIDER=psm3

 

A.2.12.

If you would like to execute a large number of tasks while keeping the number of nodes in use low, refer to the following example script to control the number of sruns to be executed simultaneously.

In this sample, one node is used and each calculation for one CPU core is repeated 3000 times.


#SBATCH -p XXX #Specify partition
#SBATCH --nodes=1
#SBATCH --ntasks=128
#SBATCH --cpus-per-task=1
 
TOTAL_TASKS=3000 # Total number of tasks (processes) to be executed repeatedly
EXE="myapp -a -b -c" # Commands to run the application
SRUN_OPTS="-N 1 -n 1 -c 1 --mem-per-cpu=1840 --exclusive"
 
CONCURRENT_TASKS=$SLURM_NTASKS # Number of concurrent executions
WAIT_TIME=10 # srun Interval to check for termination (seconds)
 
count=0
task_id=()
while [ $TOTAL_TASKS -gt $count ]; do
        for ((i=0;i<$CONCURRENT_TASKS;i++)); do
                pid=${task_id[i]:=dammy}
                if [ ! -d /proc/$pid ]; then
                        # Run srun to save the PID
                        srun $SRUN_OPTS $EXE & task_id[$i]=$!
                        count=$((++count))
                        [ $TOTAL_TASKS -eq $count ] && break
                fi
        done
 
        if ps | grep -q srun; then
                sleep $WAIT_TIME
        fi
done
 
wait # Wait for all tasks to be completed

4.Data Repository

A4.8.

For simulated data, providing a complete set of information, such as

  • input file of the software used
  • procedures and tools for generating input files
  • output file obtained by simulation
  • procedures and tools for processing output files

will aid us in assisting you. In addition, the format of the input and output files may differ depending on the version of the software, and the calculation results may differ depending on the computing environment. Therefore, from the viewpoint of their reproducibility, including the following details

  • version of software used
  • information from compile
  • calculation environment at execution

will aid us in assisting you. Note, when simulation output files are large, alternative solutions, such as describing the execution procedure of the software and enabling the same output file to be generated, could be considered.

In addition, the proposed folder structures and check sheets are posted in the manuals distributed during registration. Apart from this, you may want to refer to existing projects: (https://isspns-gitlab.issp.u-tokyo.ac.jp/k-yoshimi/physrevresearch_vol2_page032072r_year2020, etc.).