Getting PyFR working on multiple processors

Hi PyFR team,

I’ve been trying to install PyFR 1.14.0 as a user on an HPC system, but have run into a bit of trouble that I’m finding difficult to diagnose. Eventually I’d like to get it running with CUDA, but for a start I’m just trying to get it working on CPUs, which I think means using the OpenMP backend.

With the current install, the euler_vortex_2d test case works running on a single core (pyfr run -b openmp -p euler_vortex_2d.pyfrm euler_vortex_2d.ini), but when I try and run across two, having partitioned the mesh, (mpiexec -n 2 pyfr run -b openmp -p euler_vortex_2d.pyfrm euler_vortex_2d.ini), I get a string of errors which seems to originate with this one, but which I can’t find much information about:

File "/work/ec206/ec206/raw54/libs/pytools-2016.2.6/lib/python3.9/site-packages/pytools/prefork.py", line 125, in _fork_server
/work/ec206/ec206/raw54/libs/pytools-2016.2.6/lib/python3.9/site-packages/pytools/prefork.py:93: UserWarning: Prefork client exiting upon apparent death of prefork server
  warn("%s exiting upon apparent death of %s" % (who, partner))
    func_name, args, kwargs = _recv_packet(
  File "/work/ec206/ec206/raw54/libs/pytools-2016.2.6/lib/python3.9/site-packages/pytools/prefork.py", line 104, in _recv_packet
    return loads(packet)
_pickle.UnpicklingError: invalid load key, '9'.
    self.version = call_capture_output([self.cc, '-v'])
  File "/work/ec206/ec206/raw54/libs/pytools-2016.2.6/lib/python3.9/site-packages/pytools/prefork.py", line 223, in call_capture_output
    return forker.call_capture_output(cmdline, cwd, error_on_nonzero)
  File "/work/ec206/ec206/raw54/libs/pytools-2016.2.6/lib/python3.9/site-packages/pytools/prefork.py", line 179, in call_capture_output
    return self._remote_invoke("call_capture_output", cmdline, cwd,
  File "/work/ec206/ec206/raw54/libs/pytools-2016.2.6/lib/python3.9/site-packages/pytools/prefork.py", line 157, in _remote_invoke
    status, result = _recv_packet(
  File "/work/ec206/ec206/raw54/libs/pytools-2016.2.6/lib/python3.9/site-packages/pytools/prefork.py", line 89, in _recv_packet
    size_bytes = sock.recv(size_bytes_size)

MPI Rank 0 then aborts. Given that this error comes from pytools, I’ve tried both with the most recent version of the software installed by pip (2022.1.12), and the minimum recommended (2016.2.6). I’ve done the same thing with platformdirs, with the most recent and minimum recommended versions. I’ve also tried with python versions 3.9 and 3.10. The error stays mostly the same, but the invalid load key that’s being reported changes. In case it’s relevant, the mpi4py version is 3.1.3, again, installed through pip and with the centrally managed HPE MPT 2.25 package as the MPI implementation. The installed mpi4py seems to pass the simple helloworld test suggested in its install guide. I’m also happy at the moment to have serial I/O while getting it working, so the h5py module is again simply installed using pip.

I’m not really entirely sure what to try next, so I wondered if you had any advice for next steps?

Thanks very much, and all the best,
Rob

What happens if you use mpiexec in the single rank case? So mpiexec -np 1 pyfr ...?

Regards, Freddie.

Hi Freddie,

Thanks for the reply - mpiexec on one core seems to work fine, without any errors, and the case completes.

Best,
Rob

Can you try either OpenMPI or MPICH?

Regards, Freddie.