Failed to run example cases using openmp as backend

Yes, I can make sure. The full commands I used:

rm inc_cylinder_2d.pyfrm *.pyfrs *.csv

pyfr import inc_cylinder_2d.msh inc_cylinder_2d.pyfrm

pyfr run -b openmp -p inc_cylinder_2d.pyfrm inc_cylinder_2d.ini

What is the full error message/traceback?

Actually, I have done this test on my host machine and using docker (both are ubuntu 20.04).

On my host machine, the error is just Segmentation fault (core dumped), nothing more.

On the docker, the error is

[b852cfcca23a:80890] *** Process received signal ***
[b852cfcca23a:80890] Signal: Segmentation fault (11)
[b852cfcca23a:80890] Signal code: Address not mapped (1)
[b852cfcca23a:80890] Failing at address: (nil)
Segmentation fault (core dumped)

I am not sure what caused the difference.
On host machine, the ulimit -u returns 515044.
On docker container, the ulimit -u returns unlimited

So are there no errors from Python? @fdw might have a better idea about what is going on, but to me it seems like either your docker environment is not configured properly or some library/utility is incorrectly compiled.

Can you confirm if any of the mpi4py example programs work?

Regards, Freddie.

Using a clean docker container, I did a simple test using this example.

The test result is shown below. Looks fine.

(base) tang@ea1847c729c7:~/mpi4py$ mpirun -n 20 python mpi4.py
Controller @ MPI Rank   0:  Input 3826176844
   Worker at MPI Rank   1: Output 3826176845 is OK (from ea1847c729c7)
   Worker at MPI Rank   2: Output 3826176846 is OK (from ea1847c729c7)
   Worker at MPI Rank   3: Output 3826176847 is OK (from ea1847c729c7)
   Worker at MPI Rank   4: Output 3826176848 is OK (from ea1847c729c7)
   Worker at MPI Rank   5: Output 3826176849 is OK (from ea1847c729c7)
   Worker at MPI Rank   6: Output 3826176850 is OK (from ea1847c729c7)
   Worker at MPI Rank   7: Output 3826176851 is OK (from ea1847c729c7)
   Worker at MPI Rank   8: Output 3826176852 is OK (from ea1847c729c7)
   Worker at MPI Rank   9: Output 3826176853 is OK (from ea1847c729c7)
   Worker at MPI Rank  10: Output 3826176854 is OK (from ea1847c729c7)
   Worker at MPI Rank  11: Output 3826176855 is OK (from ea1847c729c7)
   Worker at MPI Rank  12: Output 3826176856 is OK (from ea1847c729c7)
   Worker at MPI Rank  13: Output 3826176857 is OK (from ea1847c729c7)
   Worker at MPI Rank  14: Output 3826176858 is OK (from ea1847c729c7)
   Worker at MPI Rank  15: Output 3826176859 is OK (from ea1847c729c7)
   Worker at MPI Rank  16: Output 3826176860 is OK (from ea1847c729c7)
   Worker at MPI Rank  17: Output 3826176861 is OK (from ea1847c729c7)
   Worker at MPI Rank  18: Output 3826176862 is OK (from ea1847c729c7)
   Worker at MPI Rank  19: Output 3826176863 is OK (from ea1847c729c7)

What CPU are you using?

Regards, Freddie.

Well, I think I have found the reason.

There is something wrong between mpi4py and Python 3.9+, see this issue, please.

Since PyFR 1.14.0 requires Python 3.9+, I found the only way to install mpi4py is via conda install -c conda-forge mpi4py. Using python -m pip install mpi4py with Python 3.9+ will finally run into error like this. However, even though I can successfully install mpi4py, the incompatibility between mpi4py and Python 3.9+ could make PyFR 1.14.0 fail to run when using openmp as backend.

I have to point out that it may be wrong to say that there is an incompatibility issue between mpi4py and Python 3.9+ since this example is OK. For the details, you can see here.

I have done a test by degrading PyFR to 1.10.0 so that I can use Python 3.8, and the error disappeared. The example cases can run without error when I use openmp as backend.

There is something wrong between mpi4py and Python 3.9+, see this issue, please.

That issue was closed over a year ago. mpi4py is definitely compatible with Python 3.9. After all, if it was not, then no one would be able to use PyFR 1.14 with multiple GPUs.

Regards, Freddie.

Agree, but I really do not know where it goes wrong.

One thing that I can confirm is that the installation guide cannot work on ubuntu 20.04. If you use pip install pyfr to install pyfr, you will definitely see the mpi4py error.

As I asked before, would you be able to confirm what CPU you are using?

Regards, Freddie.

Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz

How was libxsmm compiled?

Regards, Freddie.

I listed all steps I did for installing PyFR 1.14.0 here.

From the install guide the OpenMP backend requires:

libxsmm >= commit 14b6cea61376653b2712e3eefa72b13c5e76e421 compiled as a shared library (STATIC=0) with BLAS=0 and CODE_BUF_MAXSIZE=262144

Note the specific git revision, along with the specific compilation variables on BLAS and CODE_BUF_MAXSIZE. There is no reason to believe that the version of libxsmm obtained through conda fulfils these requirements.

Regards, Freddie.

I have ever tried to build PyFR 1.14.0 from source code following the installation guide. Unfortunately, I still failed to make it.

Sorry, I did not save the error information. If I remember correctly, all packages were compiled successfully, but failed to run the cases.

Did you try following Freddie’s advice about manually compiling libxsmm with the flags he specified?

Yes. Following the install guide, I have tried to build PyFR 1.14.0 from the source code and compile every dependent package from the source code. Every package was compiled successfully. But some errors happened when I ran the cases.

However, I am sorry that I did not save the error output. It should be something like “undefined symbol”. I am using a lower version of PyFR (1.12.3). When the simulation is over, I could try to reproduce the error output.

This error is usually a consequence of libxsmm not being compiled with the BLAS=0 flag.

Regards, Freddie.

I am sure that I have added that flag. I did it in this way:

git checkout 14b6cea61376653b2712e3eefa72b13c5e76e421
make STATIC=0 BLAS=0 CODE_BUF_MAXSIZE=262144
make install

I may try again this weekend.