Hello,
Could you advise on what would be a good environment setup for OLCF Frontier to run the develop branch of PyFR? I’ve configured everything I needed (see below) but when launching my test job with 8 rank/single node PyFR process with srun:
srun -N1 -n8 -c1 --gpus-per-task=1 --gpu-bind=closest \
pyfr run --backend hip mesh.pyfrm conf.ini
I get an error about a bad address.
process_vm_readv: Bad address
Assertion failed in file ../src/mpid/ch4/shm/cray_common/cray_common_memops.c at line 461: 0
process_vm_readv: Bad address
Assertion failed in file ../src/mpid/ch4/shm/cray_common/cray_common_memops.c at line 461: 0
/opt/cray/pe/lib64/libmpi_cray.so.12(MPL_backtrace_show+0x26) [0x7fffec1b5bbb]
/opt/cray/pe/lib64/libmpi_cray.so.12(+0x1c48264) [0x7fffeba5a264]
/opt/cray/pe/lib64/libmpi_cray.so.12(+0x21dc6c0) [0x7fffebfee6c0]
/opt/cray/pe/lib64/libmpi_cray.so.12(+0x21f86b7) [0x7fffec00a6b7]
/opt/cray/pe/lib64/libmpi_cray.so.12(+0x21d4e75) [0x7fffebfe6e75]
/opt/cray/pe/lib64/libmpi_cray.so.12(+0xd89094) [0x7fffeab9b094]
/opt/cray/pe/lib64/libmpi_cray.so.12(+0xdd3e35) [0x7fffeabe5e35]
/opt/cray/pe/lib64/libmpi_cray.so.12(PMPI_Waitall+0x3d1) [0x7fffeabe6631]
/lustre/orion/cfd219/scratch/rsawko/venvs/pyfr-with-ascent/lib/python3.13/site-packages/mpi4py/MPI.cpython-313-x86_64-linux-gnu.so(+0x118473) [0x7fffe2e93473]
/sw/frontier/spack-envs/core-25.03/opt/gcc-13.2/python-3.13.0-z6cvwh43maa5kvglquodiadny7ohzirp/lib/libpython3.13.so.1.0(PyObject_Vectorcall+0x4f) [0x7fffed4c0d2f]
/sw/frontier/spack-envs/core-25.03/opt/gcc-13.2/python-3.13.0-z6cvwh43maa5kvglquodiadny7ohzirp/lib/libpython3.13.so.1.0(_PyEval_EvalFrameDefault+0x1980) [0x7fffed45a150]
/sw/frontier/spack-envs/core-25.03/opt/gcc-13.2/python-3.13.0-z6cvwh43maa5kvglquodiadny7ohzirp/lib/libpython3.13.so.1.0(PyEval_EvalCode+0x135) [0x7fffed6115f5]
/sw/frontier/spack-envs/core-25.03/opt/gcc-13.2/python-3.13.0-z6cvwh43maa5kvglquodiadny7ohzirp/lib/libpython3.13.so.1.0(+0x2ad97e) [0x7fffed67697e]
/sw/frontier/spack-envs/core-25.03/opt/gcc-13.2/python-3.13.0-z6cvwh43maa5kvglquodiadny7ohzirp/lib/libpython3.13.so.1.0(+0x2adc27) [0x7fffed676c27]
/sw/frontier/spack-envs/core-25.03/opt/gcc-13.2/python-3.13.0-z6cvwh43maa5kvglquodiadny7ohzirp/lib/libpython3.13.so.1.0(+0x2afb56) [0x7fffed678b56]
/sw/frontier/spack-envs/core-25.03/opt/gcc-13.2/python-3.13.0-z6cvwh43maa5kvglquodiadny7ohzirp/lib/libpython3.13.so.1.0(+0x2b012c) [0x7fffed67912c]
/sw/frontier/spack-envs/core-25.03/opt/gcc-13.2/python-3.13.0-z6cvwh43maa5kvglquodiadny7ohzirp/lib/libpython3.13.so.1.0(Py_RunMain+0x9b9) [0x7fffed6a1ad9]
/sw/frontier/spack-envs/core-25.03/opt/gcc-13.2/python-3.13.0-z6cvwh43maa5kvglquodiadny7ohzirp/lib/libpython3.13.so.1.0(Py_BytesMain+0x45) [0x7fffed6a22b5]
/lib64/libc.so.6(+0x40e6c) [0x7fffed0e1e6c]
/lib64/libc.so.6(__libc_start_main+0x87) [0x7fffed0e1f35]
/lustre/orion/cfd219/scratch/rsawko/venvs/pyfr-with-ascent/bin/python(_start+0x21) [0x400da1]
My current setup
I think the following modules should work:
1) libfabric/1.22.0 14) darshan-runtime/3.4.6-mpi (E4S)
2) craype-network-ofi 15) hsi/default
3) perftools-base/24.11.0 16) lfs-wrapper/0.0.1
4) xpmem/2.11.3-1.3_gdbda01a1eb3d 17) DefApps
5) cray-pmi/6.1.15 18) python/3.13.0
6) cce/18.0.1 19) rocm/6.4.1
7) craype/2.7.33 20) craype-x86-milan
8) cray-dsmml/0.3.0 21) cray-hdf5-parallel/1.12.2.11
9) cray-mpich/8.1.31 22) craype-accel-amd-gfx90a
10) cray-libsci/24.11.0 23) conduit/0.9.5
11) PrgEnv-cray/8.6.0 24) vtkm/2.3.0
12) Core/25.03 25) ascent/0.9.5
13) tmux/3.4
where conduit, vtkm and ascent were compiled manually given the setup above. I then create my own virtual environment install mpi4py
pip install --no-binary=mpi4py mpi4py
Follow the notes here to install h5py agaist the cray HDF5 from modules and finally install PyFR with
pip install git+https://github.com/PyFR/PyFR.git@develop