Hello,
First computer with one socket and 12 cores, I run with “pyfr run -b openmp -p *.pyfrm *.ini”, the calculation time is approximately 130000.
Second computer with two sockets and 32 cores, as the performance tuning guide said, on a two socket system it is suggested to run PyFR with two MPI ranks, with each process being bound to a single socket. Then I run with “mpiexec -n 2 pyfr run -b openmp -p *.pyfrm *.ini”, the calculation time is approximately 500000. And I run with “pyfr run -b openmp -p *.pyfrm *.ini” , the calculation time is approximately 200000, which is more than the first computer.
The backend-openmp is attached.
Is there anything else I need to optimize?