FAQ

1. Public offering

A1.01. Please consider submitting a Class A application. Applications for this class are accepted at any time. Please click here for details.

2. System B ohtaka

A2.1. You can find out how many jobs are running or waiting for execution on each partition of System B ohtaka by executing the pstat command.

A2.02. Since the maximum amount of memory that can be assigned to a node is requested for each srun, it is necessary to add the following options when executing the srun command multiple times within one node in a bulk job.

 

srun -N 1 -n 1 -c 32 –mem-per-cpu=1840 a.out

 

Please refer to the System B User Guide for details.

A2.03. For example, if you execute squeue -o “%i %P %T %S  %Z”

JOBID  PARTITION  STATE    START_TIME      WORK_DIR 

is shown, allowing you to confirm the details of your job. The “START_TIME” column indicates the estimated execution time, and the “WORK_DIR” column shows the directory path. Entering a number after “%” (e.g., “%10P”) produces the corresponding number of characters as the output. You can obtain details of the squeue command options by running man squeue.

A2.04. Because it depends on the program, a general answer cannot be given. A comparison of the hybrid parallel benchmark results (runtest) on the first-principles electronic structure calculation code OpenMX with those of sekirei, enaga, and ohtaka reveals the following:

System B (sekirei) at ISSP, Univ. of Tokyo (Intel Xeon E5-2680v3 12core 2.5GHz)
icc version 18.0.5, compiler option: -O3 -xHOST -ip -no-prec-div -qopenmp -Dkcomp -fp-model precise
6 processes (MPI) x 4 thread (OpenMP)

Total elapsed time (s) 177.21
System C (enaga) at ISSP, Univ. of Tokyo (Intel Xeon 6148 20core 2.4GHz) 
icc version 18.0.5, compiler option: -O3 -xHOST -ip -no-prec-div -qopenmp -Dkcomp -fp-model precise 
5 processes (MPI) x 4 thread (OpenMP) 

Total elapsed time (s) 137.06
System B (ohtaka) (AMD EPYC 7702 (64core) ×2 , 2.0GHz )
 icc version 19.1.2.254, compiler option: -O3 -march=core-avx2 -ip -no-prec-div -qopenmp -I${MKLROOT}/include/fftw -parallel -par-schedule-auto -static-intel -qopenmp-link=static -qopt-malloc-options=3 -qopt-report 
(module load intel_compiler/2019.5.281 intel_mkl/2019.5.281 openmpi/4.0.4-intel-2019.5.281) 
32processes (MPI) x 4 thread (OpenMP) 

Total elapsed time (s) 53.43

System B (ohtaka) (AMD EPYC 7702 (64core) ×2 , 2.0GHz )
icc version 19.1.2.254, compiler option: -O3 -march=core-avx2 -ip -no-prec-div -qopenmp -I${MKLROOT}/include/fftw -parallel -par-schedule-auto -static-intel -qopenmp-link=static -qopt-malloc-options=3 -qopt-report 
(module load intel_compiler/2019.5.281 intel_mkl/2019.5.281 openmpi/4.0.4-intel-2019.5.281)
6 processes (MPI) x 4 thread (OpenMP)
Total elapsed time (s) 83.29

A 2.05. If you change the permissions of your home directory, you will not be able to access the system because of SSH restrictions.
This problem can be handled by the administrator, who can be contacted at (center_b_at_issp.u-tokyo.ac.jp).

A 2.06. Depending on the load on the shared file system (/home, /work), the quota value might not update immediately.
Please wait a few minutes and run chquota again.
If considerable time has passed and the quota value still has not changed, please contact us (center_b_at_issp.u-tokyo.ac.jp).

A 2.07. The OS may be running out of shared memory space.
The kernel parameters on the compute node would possibly need to be reviewed. Please contact us at (center_b_at_issp.u-tokyo.ac.jp).

A 2.08. The number of sruns during bulk job execution may have reached the upper limit.
Under current operations, the upper limit on the number of concurrently executable sruns when executing bulk jobs is set to 2,500.
Check the number of concurrent sruns in your script.
If the number of sruns is less than 2,500 please contact us (center_b_at_issp.u-tokyo.ac.jp).

A2.09.

As of September 2022 MPI_comm_spawn does not work correctly in the default environment.
To make MPI_COMM_SPAWN work correctly, load the IntelOneAPI compiler and MPI library using

module purge
module load oneapi_compiler/2022.1.2 oneapi_mpi/2022.1.2

and change the environment variable to

I_MPI_SPAWN=on
SLURM_EXACT=1

and use the SBATCH option to set the memory usage per core to

#SBATCH --mem-per-cpu 1840

Note that the following settings need to be used when executing srun when the amount of memory used per core is specified during an interactive job

srun --mem-per-cpu=1840

If you wish to have the contents of your home directory referenced by your own group, you can use a feature called ACL.

 

For example, if you want the group k9999 to refer to your own k999901 home directory, the k9999 group will be accessible by executing the following command.

cd /home/k9999
setfacl -m group:k9999:r-x ./k999901

 

It is also possible to specify a user as follows.

To make k999902 refer to its own k999901 home directory, execute the following command.

setfacl -m user: k999902:r-x ./k999901

To check the configured ACLs, use the following command.

getfacl ./k999901

A2.11.

This may be due to a change in the shared memory mechanism used for intra-node communication in Intel oneAPI MPI in the Intel compiler version upgrade on 2023/04.

Before executing the program in the job script, specify psm3 in the following environment variable to change the shared memory mechanism and see if the situation improves.

  (environment variable)

   export FI_PROVIDER=psm3

 

A.2.12.

If you would like to execute a large number of tasks while keeping the number of nodes in use low, refer to the following example script to control the number of sruns to be executed simultaneously.

In this sample, one node is used and each calculation for one CPU core is repeated 3000 times.


#SBATCH -p XXX #Specify partition
#SBATCH --nodes=1
#SBATCH --ntasks=128
#SBATCH --cpus-per-task=1
 
TOTAL_TASKS=3000 # Total number of tasks (processes) to be executed repeatedly
EXE="myapp -a -b -c" # Commands to run the application
SRUN_OPTS="-N 1 -n 1 -c 1 --mem-per-cpu=1840 --exclusive"
 
CONCURRENT_TASKS=$SLURM_NTASKS # Number of concurrent executions
WAIT_TIME=10 # srun Interval to check for termination (seconds)
 
count=0
task_id=()
while [ $TOTAL_TASKS -gt $count ]; do
        for ((i=0;i<$CONCURRENT_TASKS;i++)); do
                pid=${task_id[i]:=dammy}
                if [ ! -d /proc/$pid ]; then
                        # Run srun to save the PID
                        srun $SRUN_OPTS $EXE & task_id[$i]=$!
                        count=$((++count))
                        [ $TOTAL_TASKS -eq $count ] && break
                fi
        done
 
        if ps | grep -q srun; then
                sleep $WAIT_TIME
        fi
done
 
wait # Wait for all tasks to be completed

3. System C enaga

4.Data Repository

A4.1. A detailed explanation is available. On the upper right-hand side of the page on the portal site, click “About,” followed by “How to register your dataset.”

A4.2. The review assesses whether the research data are related to materials science and, if the requested capacity exceeds the default value, whether the request is reasonable (e.g., estimation of data volume, balancing needs and services, etc.).

A4.3. The review is conducted at the regular monthly meeting of the Design Department of the Materials Design and Characterization Laboratory. Accordingly, the review process could take up to a month.

A4.4. Please indicate your required Repository Size (GB) capacity and state your rationale (number of data records xxx, about xxx bytes per record, etc.) when you apply. In addition, should you need additional capacity after applying, please contact us using the contact information below. Based on a review of the contents of your application, the use of more than 2 GB may be approved.

A4.5. We recommend that the registered data be made public. Should you wish to make the data private, please make the data public within 5 years after your project starts. Should you wish to keep your project information private for a longer period of time, please use the details below to contact us before the deadline.

A4.6. To create additional new projects, please re-apply through the portal site.

A4.7. If you wish to make corrections or additions to information submitted through the portal site, please contact us with your corrections at the address listed on the Data Repository page.

A4.8.

For simulated data, providing a complete set of information, such as

  • input file of the software used
  • procedures and tools for generating input files
  • output file obtained by simulation
  • procedures and tools for processing output files

will aid us in assisting you. In addition, the format of the input and output files may differ depending on the version of the software, and the calculation results may differ depending on the computing environment. Therefore, from the viewpoint of their reproducibility, including the following details

  • version of software used
  • information from compile
  • calculation environment at execution

will aid us in assisting you. Note, when simulation output files are large, alternative solutions, such as describing the execution procedure of the software and enabling the same output file to be generated, could be considered.

In addition, the proposed folder structures and check sheets are posted in the manuals distributed during registration. Apart from this, you may want to refer to existing projects: (https://isspns-gitlab.issp.u-tokyo.ac.jp/k-yoshimi/physrevresearch_vol2_page032072r_year2020, etc.).