Q2.11. With the combination of Intel oneAPI Compiler + Intel oneAPI MPI, programs that worked properly before 2023/04 do not work properly in the following examples.
1) The following error message is output.
mm_xpmem.c:136 UCX ERROR failed to attach xpmem apid 0x500002a674 offset 0x14d95ec12000 length 348160: Cannot allocate memory
ucp_rkey.c:476 UCX ERROR failed to unpack remote key from remote md[6]: Input/output error
2) The program times out and exits without outputting an error message.
A2.11.
This may be due to a change in the shared memory mechanism used for intra-node communication in Intel oneAPI MPI in the Intel compiler version upgrade on 2023/04.
Before executing the program in the job script, specify psm3 in the following environment variable to change the shared memory mechanism and see if the situation improves.
(environment variable)
export FI_PROVIDER=psm3