Failed to run example cases using openmp as backend

Dear all,
I am trying to learn PyFR to do some simulations. I have installed PyFR and successfully run the three examples following the guide. However, I failed try to run the example case using openmp as backend.

How I installed PyFR

OS: Ubuntu 20.04
GCC version: 9.4.0

  1. create a new Python virtual environment by conda create -n pyfr python=3.10
  2. install the dependencies by sudo apt-get install -y libopenmpi-dev openmpi-bin metis libmetis-dev
  3. install mpi4py by conda install -c conda-forge mpi4py
  4. install libxsmm by conda install -c conda-forge libxsmm
  5. install PyFR by pip install pyfr

I try to run the inc_cylinder_2d example case using openmp. The command I used

1. pyfr import inc_cylinder_2d.msh inc_cylinder_2d.pyfrm
2. pyfr partition 2 inc_cylinder_2d.pyfrm .
3. mpiexec -n 2 pyfr run -b openmp -p inc_cylinder_2d.pyfrm inc_cylinder_2d.ini

The error is

21:23 $ mpiexec -n 2 pyfr run -b openmp -p inc_cylinder_2d.pyfrm inc_cylinder_2d.ini

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 984860 RUNNING AT tang-Workstation
=   EXIT CODE: 139
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault (signal 11)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

I also tried to add the following commands to the original configuration file, but it did not work.

[backend-openmp]
cc = gcc
gimmik-max-nnz = 100000

Could you give me some suggestions, please?

Best regards,
Hongwei

Can you confirm if it works with a single partition?

Regards, Freddie.

Still failed. Does this mean something wrong with the openmpi installation?

$ pyfr run -b openmp -p inc_cylinder_2d.pyfrm inc_cylinder_2d.ini
Segmentation fault (core dumped)

Did you make sure to repartition the case with a single partition before trying this?

Yes, I can make sure. The full commands I used:

rm inc_cylinder_2d.pyfrm *.pyfrs *.csv

pyfr import inc_cylinder_2d.msh inc_cylinder_2d.pyfrm

pyfr run -b openmp -p inc_cylinder_2d.pyfrm inc_cylinder_2d.ini

What is the full error message/traceback?

Actually, I have done this test on my host machine and using docker (both are ubuntu 20.04).

On my host machine, the error is just Segmentation fault (core dumped), nothing more.

On the docker, the error is

[b852cfcca23a:80890] *** Process received signal ***
[b852cfcca23a:80890] Signal: Segmentation fault (11)
[b852cfcca23a:80890] Signal code: Address not mapped (1)
[b852cfcca23a:80890] Failing at address: (nil)
Segmentation fault (core dumped)

I am not sure what caused the difference.
On host machine, the ulimit -u returns 515044.
On docker container, the ulimit -u returns unlimited

So are there no errors from Python? @fdw might have a better idea about what is going on, but to me it seems like either your docker environment is not configured properly or some library/utility is incorrectly compiled.

Can you confirm if any of the mpi4py example programs work?

Regards, Freddie.

Using a clean docker container, I did a simple test using this example.

The test result is shown below. Looks fine.

(base) tang@ea1847c729c7:~/mpi4py$ mpirun -n 20 python mpi4.py
Controller @ MPI Rank   0:  Input 3826176844
   Worker at MPI Rank   1: Output 3826176845 is OK (from ea1847c729c7)
   Worker at MPI Rank   2: Output 3826176846 is OK (from ea1847c729c7)
   Worker at MPI Rank   3: Output 3826176847 is OK (from ea1847c729c7)
   Worker at MPI Rank   4: Output 3826176848 is OK (from ea1847c729c7)
   Worker at MPI Rank   5: Output 3826176849 is OK (from ea1847c729c7)
   Worker at MPI Rank   6: Output 3826176850 is OK (from ea1847c729c7)
   Worker at MPI Rank   7: Output 3826176851 is OK (from ea1847c729c7)
   Worker at MPI Rank   8: Output 3826176852 is OK (from ea1847c729c7)
   Worker at MPI Rank   9: Output 3826176853 is OK (from ea1847c729c7)
   Worker at MPI Rank  10: Output 3826176854 is OK (from ea1847c729c7)
   Worker at MPI Rank  11: Output 3826176855 is OK (from ea1847c729c7)
   Worker at MPI Rank  12: Output 3826176856 is OK (from ea1847c729c7)
   Worker at MPI Rank  13: Output 3826176857 is OK (from ea1847c729c7)
   Worker at MPI Rank  14: Output 3826176858 is OK (from ea1847c729c7)
   Worker at MPI Rank  15: Output 3826176859 is OK (from ea1847c729c7)
   Worker at MPI Rank  16: Output 3826176860 is OK (from ea1847c729c7)
   Worker at MPI Rank  17: Output 3826176861 is OK (from ea1847c729c7)
   Worker at MPI Rank  18: Output 3826176862 is OK (from ea1847c729c7)
   Worker at MPI Rank  19: Output 3826176863 is OK (from ea1847c729c7)

What CPU are you using?

Regards, Freddie.

Well, I think I have found the reason.

There is something wrong between mpi4py and Python 3.9+, see this issue, please.

Since PyFR 1.14.0 requires Python 3.9+, I found the only way to install mpi4py is via conda install -c conda-forge mpi4py. Using python -m pip install mpi4py with Python 3.9+ will finally run into error like this. However, even though I can successfully install mpi4py, the incompatibility between mpi4py and Python 3.9+ could make PyFR 1.14.0 fail to run when using openmp as backend.

I have to point out that it may be wrong to say that there is an incompatibility issue between mpi4py and Python 3.9+ since this example is OK. For the details, you can see here.

I have done a test by degrading PyFR to 1.10.0 so that I can use Python 3.8, and the error disappeared. The example cases can run without error when I use openmp as backend.

There is something wrong between mpi4py and Python 3.9+, see this issue, please.

That issue was closed over a year ago. mpi4py is definitely compatible with Python 3.9. After all, if it was not, then no one would be able to use PyFR 1.14 with multiple GPUs.

Regards, Freddie.

Agree, but I really do not know where it goes wrong.

One thing that I can confirm is that the installation guide cannot work on ubuntu 20.04. If you use pip install pyfr to install pyfr, you will definitely see the mpi4py error.

As I asked before, would you be able to confirm what CPU you are using?

Regards, Freddie.

Intel(R) Xeon(R) CPU E5-2650 v4 @ 2.20GHz

How was libxsmm compiled?

Regards, Freddie.

I listed all steps I did for installing PyFR 1.14.0 here.

From the install guide the OpenMP backend requires:

libxsmm >= commit 14b6cea61376653b2712e3eefa72b13c5e76e421 compiled as a shared library (STATIC=0) with BLAS=0 and CODE_BUF_MAXSIZE=262144

Note the specific git revision, along with the specific compilation variables on BLAS and CODE_BUF_MAXSIZE. There is no reason to believe that the version of libxsmm obtained through conda fulfils these requirements.

Regards, Freddie.

I have ever tried to build PyFR 1.14.0 from source code following the installation guide. Unfortunately, I still failed to make it.

Sorry, I did not save the error information. If I remember correctly, all packages were compiled successfully, but failed to run the cases.