Yes, I can make sure. The full commands I used:
rm inc_cylinder_2d.pyfrm *.pyfrs *.csv
pyfr import inc_cylinder_2d.msh inc_cylinder_2d.pyfrm
pyfr run -b openmp -p inc_cylinder_2d.pyfrm inc_cylinder_2d.ini
Yes, I can make sure. The full commands I used:
rm inc_cylinder_2d.pyfrm *.pyfrs *.csv
pyfr import inc_cylinder_2d.msh inc_cylinder_2d.pyfrm
pyfr run -b openmp -p inc_cylinder_2d.pyfrm inc_cylinder_2d.ini
What is the full error message/traceback?
Actually, I have done this test on my host machine and using docker (both are ubuntu 20.04).
On my host machine, the error is just Segmentation fault (core dumped)
, nothing more.
On the docker, the error is
[b852cfcca23a:80890] *** Process received signal ***
[b852cfcca23a:80890] Signal: Segmentation fault (11)
[b852cfcca23a:80890] Signal code: Address not mapped (1)
[b852cfcca23a:80890] Failing at address: (nil)
Segmentation fault (core dumped)
I am not sure what caused the difference.
On host machine, the ulimit -u
returns 515044
.
On docker container, the ulimit -u
returns unlimited
So are there no errors from Python? @fdw might have a better idea about what is going on, but to me it seems like either your docker environment is not configured properly or some library/utility is incorrectly compiled.
Can you confirm if any of the mpi4py example programs work?
Regards, Freddie.
Using a clean docker container, I did a simple test using this example.
The test result is shown below. Looks fine.
(base) tang@ea1847c729c7:~/mpi4py$ mpirun -n 20 python mpi4.py
Controller @ MPI Rank 0: Input 3826176844
Worker at MPI Rank 1: Output 3826176845 is OK (from ea1847c729c7)
Worker at MPI Rank 2: Output 3826176846 is OK (from ea1847c729c7)
Worker at MPI Rank 3: Output 3826176847 is OK (from ea1847c729c7)
Worker at MPI Rank 4: Output 3826176848 is OK (from ea1847c729c7)
Worker at MPI Rank 5: Output 3826176849 is OK (from ea1847c729c7)
Worker at MPI Rank 6: Output 3826176850 is OK (from ea1847c729c7)
Worker at MPI Rank 7: Output 3826176851 is OK (from ea1847c729c7)
Worker at MPI Rank 8: Output 3826176852 is OK (from ea1847c729c7)
Worker at MPI Rank 9: Output 3826176853 is OK (from ea1847c729c7)
Worker at MPI Rank 10: Output 3826176854 is OK (from ea1847c729c7)
Worker at MPI Rank 11: Output 3826176855 is OK (from ea1847c729c7)
Worker at MPI Rank 12: Output 3826176856 is OK (from ea1847c729c7)
Worker at MPI Rank 13: Output 3826176857 is OK (from ea1847c729c7)
Worker at MPI Rank 14: Output 3826176858 is OK (from ea1847c729c7)
Worker at MPI Rank 15: Output 3826176859 is OK (from ea1847c729c7)
Worker at MPI Rank 16: Output 3826176860 is OK (from ea1847c729c7)
Worker at MPI Rank 17: Output 3826176861 is OK (from ea1847c729c7)
Worker at MPI Rank 18: Output 3826176862 is OK (from ea1847c729c7)
Worker at MPI Rank 19: Output 3826176863 is OK (from ea1847c729c7)
What CPU are you using?
Regards, Freddie.
Well, I think I have found the reason.
There is something wrong between mpi4py
and Python 3.9+
, see this issue, please.
Since PyFR 1.14.0
requires Python 3.9+
, I found the only way to install mpi4py
is via conda install -c conda-forge mpi4py
. Using python -m pip install mpi4py
with Python 3.9+
will finally run into error like this. However, even though I can successfully install mpi4py
, the incompatibility between mpi4py
and Python 3.9+
could make PyFR 1.14.0
fail to run when using openmp
as backend.
I have to point out that it may be wrong to say that there is an incompatibility issue between mpi4py
and Python 3.9+
since this example is OK. For the details, you can see here.
I have done a test by degrading PyFR
to 1.10.0
so that I can use Python 3.8
, and the error disappeared. The example cases can run without error when I use openmp
as backend.
There is something wrong between
mpi4py
andPython 3.9+
, see this issue, please.
That issue was closed over a year ago. mpi4py is definitely compatible with Python 3.9. After all, if it was not, then no one would be able to use PyFR 1.14 with multiple GPUs.
Regards, Freddie.
Agree, but I really do not know where it goes wrong.
One thing that I can confirm is that the installation guide cannot work on ubuntu 20.04. If you use pip install pyfr
to install pyfr
, you will definitely see the mpi4py
error.
As I asked before, would you be able to confirm what CPU you are using?
Regards, Freddie.
Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz
How was libxsmm compiled?
Regards, Freddie.
From the install guide the OpenMP backend requires:
libxsmm >= commit 14b6cea61376653b2712e3eefa72b13c5e76e421 compiled as a shared library (STATIC=0) with BLAS=0 and CODE_BUF_MAXSIZE=262144
Note the specific git revision, along with the specific compilation variables on BLAS and CODE_BUF_MAXSIZE. There is no reason to believe that the version of libxsmm obtained through conda
fulfils these requirements.
Regards, Freddie.
I have ever tried to build PyFR 1.14.0 from source code following the installation guide. Unfortunately, I still failed to make it.
Sorry, I did not save the error information. If I remember correctly, all packages were compiled successfully, but failed to run the cases.
Did you try following Freddie’s advice about manually compiling libxsmm with the flags he specified?
Yes. Following the install guide, I have tried to build PyFR 1.14.0 from the source code and compile every dependent package from the source code. Every package was compiled successfully. But some errors happened when I ran the cases.
However, I am sorry that I did not save the error output. It should be something like “undefined symbol”. I am using a lower version of PyFR (1.12.3). When the simulation is over, I could try to reproduce the error output.
This error is usually a consequence of libxsmm not being compiled with the BLAS=0 flag.
Regards, Freddie.
I am sure that I have added that flag. I did it in this way:
git checkout 14b6cea61376653b2712e3eefa72b13c5e76e421
make STATIC=0 BLAS=0 CODE_BUF_MAXSIZE=262144
make install
I may try again this weekend.