ISSP Supercomputer Center

A1.01. Please consider submitting a Class A application. Applications for this class are accepted at any time. Please click here for details.

A2.1. You can find out how many jobs are running or waiting for execution on each partition of System B ohtaka by executing the pstat command.

A2.02. Since the maximum amount of memory that can be assigned to a node is requested for each srun, it is necessary to add the following options when executing the srun command multiple times within one node in a bulk job.

srun -N 1 -n 1 -c 32 –mem-per-cpu=1840 a.out

Please refer to the System B User Guide for details.

A2.03. For example, if you execute squeue -o “%i %P %T %S %Z”

JOBID PARTITION STATE START_TIME WORK_DIR

is shown, allowing you to confirm the details of your job. The “START_TIME” column indicates the estimated execution time, and the “WORK_DIR” column shows the directory path. Entering a number after “%” (e.g., “%10P”) produces the corresponding number of characters as the output. You can obtain details of the squeue command options by running man squeue.

A2.04. Because it depends on the program, a general answer cannot be given. A comparison of the hybrid parallel benchmark results (runtest) on the first-principles electronic structure calculation code OpenMX with those of sekirei, enaga, and ohtaka reveals the following:

System B (sekirei) at ISSP, Univ. of Tokyo (Intel Xeon E5-2680v3 12core 2.5GHz)
icc version 18.0.5, compiler option: -O3 -xHOST -ip -no-prec-div -qopenmp -Dkcomp -fp-model precise
6 processes (MPI) x 4 thread (OpenMP)

Total elapsed time (s) 177.21

System C (enaga) at ISSP, Univ. of Tokyo (Intel Xeon 6148 20core 2.4GHz) 
icc version 18.0.5, compiler option: -O3 -xHOST -ip -no-prec-div -qopenmp -Dkcomp -fp-model precise 
5 processes (MPI) x 4 thread (OpenMP) 

Total elapsed time (s) 137.06

System B (ohtaka) (AMD EPYC 7702 (64core) ×2 , 2.0GHz )
 icc version 19.1.2.254, compiler option: -O3 -march=core-avx2 -ip -no-prec-div -qopenmp -I${MKLROOT}/include/fftw -parallel -par-schedule-auto -static-intel -qopenmp-link=static -qopt-malloc-options=3 -qopt-report 
（module load intel_compiler/2019.5.281 intel_mkl/2019.5.281 openmpi/4.0.4-intel-2019.5.281） 
32processes (MPI) x 4 thread (OpenMP) 

Total elapsed time (s) 53.43

System B (ohtaka) (AMD EPYC 7702 (64core) ×2 , 2.0GHz )
icc version 19.1.2.254, compiler option: -O3 -march=core-avx2 -ip -no-prec-div -qopenmp -I${MKLROOT}/include/fftw -parallel -par-schedule-auto -static-intel -qopenmp-link=static -qopt-malloc-options=3 -qopt-report 
(module load intel_compiler/2019.5.281 intel_mkl/2019.5.281 openmpi/4.0.4-intel-2019.5.281）
6 processes (MPI) x 4 thread (OpenMP)
Total elapsed time (s) 83.29

A 2.05. If you change the permissions of your home directory, you will not be able to access the system because of SSH restrictions.
This problem can be handled by the administrator, who can be contacted at (center_b_at_issp.u-tokyo.ac.jp).

A 2.06. Depending on the load on the shared file system (/home, /work), the quota value might not update immediately.
Please wait a few minutes and run chquota again.
If considerable time has passed and the quota value still has not changed, please contact us (center_b_at_issp.u-tokyo.ac.jp).

A 2.07. The OS may be running out of shared memory space.
The kernel parameters on the compute node would possibly need to be reviewed. Please contact us at (center_b_at_issp.u-tokyo.ac.jp).

A 2.08. The number of sruns during bulk job execution may have reached the upper limit.
Under current operations, the upper limit on the number of concurrently executable sruns when executing bulk jobs is set to 2,500.
Check the number of concurrent sruns in your script.
If the number of sruns is less than 2,500 please contact us (center_b_at_issp.u-tokyo.ac.jp).

A2.09.

As of September 2022 MPI_comm_spawn does not work correctly in the default environment.
To make MPI_COMM_SPAWN work correctly, load the IntelOneAPI compiler and MPI library using

module purge
module load oneapi_compiler/2022.1.2 oneapi_mpi/2022.1.2

and change the environment variable to

I_MPI_SPAWN=on
SLURM_EXACT=1

and use the SBATCH option to set the memory usage per core to

#SBATCH --mem-per-cpu 1840

Note that the following settings need to be used when executing srun when the amount of memory used per core is specified during an interactive job

srun --mem-per-cpu=1840

If you wish to have the contents of your home directory referenced by your own group, you can use a feature called ACL.

For example, if you want the group k9999 to refer to your own k999901 home directory, the k9999 group will be accessible by executing the following command.

cd /home/k9999
setfacl -m group:k9999:r-x ./k999901

It is also possible to specify a user as follows.

To make k999902 refer to its own k999901 home directory, execute the following command.

setfacl -m user:k999902:r-x ./k999901

To check the configured ACLs, use the following command.

getfacl ./k999901

A2.11.

This may be due to a change in the shared memory mechanism used for intra-node communication in Intel oneAPI MPI in the Intel compiler version upgrade on 2023/04.

Before executing the program in the job script, specify psm3 in the following environment variable to change the shared memory mechanism and see if the situation improves.

　　(environment variable）

　　　export FI_PROVIDER=psm3

A.2.12.

If you would like to execute a large number of tasks while keeping the number of nodes in use low, refer to the following example script to control the number of sruns to be executed simultaneously.

In this sample, one node is used and each calculation for one CPU core is repeated 3000 times.


#SBATCH -p XXX #Specify partition
#SBATCH --nodes=1
#SBATCH --ntasks=128
#SBATCH --cpus-per-task=1
 
TOTAL_TASKS=3000 # Total number of tasks (processes) to be executed repeatedly
EXE="myapp -a -b -c" # Commands to run the application
SRUN_OPTS="-N 1 -n 1 -c 1 --mem-per-cpu=1840 --exclusive"
 
CONCURRENT_TASKS=$SLURM_NTASKS # Number of concurrent executions
WAIT_TIME=10 # srun Interval to check for termination (seconds)
 
count=0
task_id=()
while [ $TOTAL_TASKS -gt $count ]; do
        for ((i=0;i<$CONCURRENT_TASKS;i++)); do
                pid=${task_id[i]:=dammy}
                if [ ! -d /proc/$pid ]; then
                        # Run srun to save the PID
                        srun $SRUN_OPTS $EXE & task_id[$i]=$!
                        count=$((++count))
                        [ $TOTAL_TASKS -eq $count ] && break
                fi
        done
 
        if ps | grep -q srun; then
                sleep $WAIT_TIME
        fi
done
 
wait # Wait for all tasks to be completed

A4.1. A detailed explanation is available. On the upper right-hand side of the page on the portal site, click “About,” followed by “How to register your dataset.”

A4.2. The review assesses whether the research data are related to materials science and, if the requested capacity exceeds the default value, whether the request is reasonable (e.g., estimation of data volume, balancing needs and services, etc.).

A4.3. The review is conducted at the regular monthly meeting of the Design Department of the Materials Design and Characterization Laboratory. Accordingly, the review process could take up to a month.

A4.4. Please indicate your required Repository Size (GB) capacity and state your rationale (number of data records xxx, about xxx bytes per record, etc.) when you apply. In addition, should you need additional capacity after applying, please contact us using the contact information below. Based on a review of the contents of your application, the use of more than 2 GB may be approved.

A4.5. We recommend that the registered data be made public. Should you wish to make the data private, please make the data public within 5 years after your project starts. Should you wish to keep your project information private for a longer period of time, please use the details below to contact us before the deadline.

A4.6. To create additional new projects, please re-apply through the portal site.

A4.7. If you wish to make corrections or additions to information submitted through the portal site, please contact us with your corrections at the address listed on the Data Repository page.

A4.8.

For simulated data, providing a complete set of information, such as

input file of the software used
procedures and tools for generating input files
output file obtained by simulation
procedures and tools for processing output files

will aid us in assisting you. In addition, the format of the input and output files may differ depending on the version of the software, and the calculation results may differ depending on the computing environment. Therefore, from the viewpoint of their reproducibility, including the following details

version of software used
information from compile
calculation environment at execution

will aid us in assisting you. Note, when simulation output files are large, alternative solutions, such as describing the execution procedure of the software and enabling the same output file to be generated, could be considered.

In addition, the proposed folder structures and check sheets are posted in the manuals distributed during registration. Apart from this, you may want to refer to existing projects: (https://isspns-gitlab.issp.u-tokyo.ac.jp/k-yoshimi/physrevresearch_vol2_page032072r_year2020, etc.).

Language Switcher

FAQ

1. Public offering

2. System B ohtaka

3. System C enaga

4.Data Repository

1. Public offering

Q1.01. I’m considering using the ISSP Supercomputer, but I would like to try it out first.

2. System B ohtaka

Q2.01. I would like to verify how many jobs have been submitted to each partition.

Q2.02. When running a bulk job, an error is returned when attempting to run multiple programs on one node.

Q2.03. I want to check when the job I submitted will be executed, and confirm which directory it was submitted to.

Q2.04. System B ohtaka uses AMD CPUs, but what about their actual performance?

Q 2.05. After changing the permissions of my home directory, I am no longer able to login to ohtaka.

Q 2.06. I deleted files on the shared file system (/home, /work), but the chquota value did not change.

Q 2.08. When executing a bulk job, the following message is returned as output and an error occurs. srun: error: slurm_init_msg_engine_port error Address already in use

Q2.09. What should I do to make MPI_COMM_SPAWN work correctly?

Q2.10. I want to share files within the same group.

Q2.12. I want to perform a large number of tasks while keeping the number of nodes in use low with bulk jobs.

3. System C enaga

4.Data Repository

Q4.1. I am not sure what to include when registering a dataset.

Q4.2. I require information about the review criteria.

Q4.3. I would like to know how long the review will take.

Q4.4. I require more than 2 GB in the data repository.

Q4.5. I would prefer to keep my data private.

Q4.6. I want to add a project.

Q4.7. I would like to make corrections or additions to the information submitted through the portal site.

Q4.8. I want to submit my data used for publishing papers, but it is not clear how to do this.