How to set openmpi

Hello,

First computer with one socket and 12 cores, I run with “pyfr run -b openmp -p *.pyfrm *.ini”, the calculation time is approximately 130000.

Second computer with two sockets and 32 cores, as the performance tuning guide said, on a two socket system it is suggested to run PyFR with two MPI ranks, with each process being bound to a single socket. Then I run with “mpiexec -n 2 pyfr run -b openmp -p *.pyfrm *.ini”, the calculation time is approximately 500000. And I run with “pyfr run -b openmp -p *.pyfrm *.ini” , the calculation time is approximately 200000, which is more than the first computer.

The backend-openmp is attached.
image

Is there anything else I need to optimize?

You need to tell OpenMPI to assign a certain number of CPU cores to each rank. Further, you will also want to explicitly set the number of OpenMP threads. This is all described in the OpenMPI manual. Finally, you will also want to make sure each socket is a single NUMA zone. The lscpu command can let you know if this is the case.

Failing all of that, you can partition into 32 pieces and then just run with 32 ranks and OMP_NUM_THREADS=1. The performance will not be as good, but it is easier to set up.

Regards, Freddie.