When I run openmp with “mpiexec -n 2 pyfr run -b openmp -p circle61.pyfrm circle.ini”, it takes about 10hours. But when I use “mpiexec -n 8 pyfr run -b openmp -p circle61.pyfrm circle.ini”, or “mpiexec -n 16 pyfr run -b openmp -p circle61.pyfrm circle.ini” it takes about 403 hours or more.
The platform is the virtual machine with 8*2 threads.
The issue is that when you run with 8 or 16 ranks each of those ranks will try and launch 16 threads which creates massive over-subscription. Please consult the documentation for your MPI library for how to properly run hybrid applications.
I also note that this is clearly described in the performance tuning section of the PyFR documentation.
yeah, “on a two socket system it is suggested to run PyFR with two MPI ranks, with each process being bound to a single socket” in the performance tuning section of the PyFR documentation. But my computer only has one socket, could you give me more details about solving this problem?
If your system does indeed only have a single socket then you should not be using mpiexec to launch it. Just run pyfr ... bare. You can use the OMP_NUM_THREADS environment variable to change how many cores are used.
The .ini file options you showed are not part of the current version of PyFR; please see the documentation for the OpenMP backend.