Dear all,
I am trying to learn PyFR to do some simulations. I have installed PyFR and successfully run the three examples following the guide. However, I failed try to run the example case using openmp as backend.
How I installed PyFR
OS: Ubuntu 20.04
GCC version: 9.4.0
create a new Python virtual environment by conda create -n pyfr python=3.10
install the dependencies by sudo apt-get install -y libopenmpi-dev openmpi-bin metis libmetis-dev
install mpi4py by conda install -c conda-forge mpi4py
install libxsmm by conda install -c conda-forge libxsmm
21:23 $ mpiexec -n 2 pyfr run -b openmp -p inc_cylinder_2d.pyfrm inc_cylinder_2d.ini
===================================================================================
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
= PID 984860 RUNNING AT tang-Workstation
= EXIT CODE: 139
= CLEANING UP REMAINING PROCESSES
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions
Actually, I have done this test on my host machine and using docker (both are ubuntu 20.04).
On my host machine, the error is just Segmentation fault (core dumped), nothing more.
On the docker, the error is
[b852cfcca23a:80890] *** Process received signal ***
[b852cfcca23a:80890] Signal: Segmentation fault (11)
[b852cfcca23a:80890] Signal code: Address not mapped (1)
[b852cfcca23a:80890] Failing at address: (nil)
Segmentation fault (core dumped)
I am not sure what caused the difference.
On host machine, the ulimit -u returns 515044.
On docker container, the ulimit -u returns unlimited
So are there no errors from Python? @fdw might have a better idea about what is going on, but to me it seems like either your docker environment is not configured properly or some library/utility is incorrectly compiled.
Using a clean docker container, I did a simple test using this example.
The test result is shown below. Looks fine.
(base) tang@ea1847c729c7:~/mpi4py$ mpirun -n 20 python mpi4.py
Controller @ MPI Rank 0: Input 3826176844
Worker at MPI Rank 1: Output 3826176845 is OK (from ea1847c729c7)
Worker at MPI Rank 2: Output 3826176846 is OK (from ea1847c729c7)
Worker at MPI Rank 3: Output 3826176847 is OK (from ea1847c729c7)
Worker at MPI Rank 4: Output 3826176848 is OK (from ea1847c729c7)
Worker at MPI Rank 5: Output 3826176849 is OK (from ea1847c729c7)
Worker at MPI Rank 6: Output 3826176850 is OK (from ea1847c729c7)
Worker at MPI Rank 7: Output 3826176851 is OK (from ea1847c729c7)
Worker at MPI Rank 8: Output 3826176852 is OK (from ea1847c729c7)
Worker at MPI Rank 9: Output 3826176853 is OK (from ea1847c729c7)
Worker at MPI Rank 10: Output 3826176854 is OK (from ea1847c729c7)
Worker at MPI Rank 11: Output 3826176855 is OK (from ea1847c729c7)
Worker at MPI Rank 12: Output 3826176856 is OK (from ea1847c729c7)
Worker at MPI Rank 13: Output 3826176857 is OK (from ea1847c729c7)
Worker at MPI Rank 14: Output 3826176858 is OK (from ea1847c729c7)
Worker at MPI Rank 15: Output 3826176859 is OK (from ea1847c729c7)
Worker at MPI Rank 16: Output 3826176860 is OK (from ea1847c729c7)
Worker at MPI Rank 17: Output 3826176861 is OK (from ea1847c729c7)
Worker at MPI Rank 18: Output 3826176862 is OK (from ea1847c729c7)
Worker at MPI Rank 19: Output 3826176863 is OK (from ea1847c729c7)
There is something wrong between mpi4py and Python 3.9+, see this issue, please.
Since PyFR 1.14.0 requires Python 3.9+, I found the only way to install mpi4py is via conda install -c conda-forge mpi4py. Using python -m pip install mpi4py with Python 3.9+ will finally run into error like this. However, even though I can successfully install mpi4py, the incompatibility between mpi4py and Python 3.9+ could make PyFR 1.14.0 fail to run when using openmp as backend.
I have to point out that it may be wrong to say that there is an incompatibility issue between mpi4py and Python 3.9+ since this example is OK. For the details, you can see here.
I have done a test by degrading PyFR to 1.10.0 so that I can use Python 3.8, and the error disappeared. The example cases can run without error when I use openmp as backend.
There is something wrong between mpi4py and Python 3.9+, see this issue, please.
That issue was closed over a year ago. mpi4py is definitely compatible with Python 3.9. After all, if it was not, then no one would be able to use PyFR 1.14 with multiple GPUs.
Agree, but I really do not know where it goes wrong.
One thing that I can confirm is that the installation guide cannot work on ubuntu 20.04. If you use pip install pyfr to install pyfr, you will definitely see the mpi4py error.
From the install guide the OpenMP backend requires:
libxsmm >= commit 14b6cea61376653b2712e3eefa72b13c5e76e421 compiled as a shared library (STATIC=0) with BLAS=0 and CODE_BUF_MAXSIZE=262144
Note the specific git revision, along with the specific compilation variables on BLAS and CODE_BUF_MAXSIZE. There is no reason to believe that the version of libxsmm obtained through conda fulfils these requirements.