Low CPU utilisation when running multiple PyFR simulations with OpenMP backend on macOS

Hi PyFR team,

I’m currently running PyFR using the openmp backend on macOS (Apple Silicon), and I’ve noticed that CPU utilisation remains very low — even when running a single simulation. I’m monitoring with both htop and Activity Monitor, and usage rarely exceeds 1–2 logical cores’ worth, regardless of problem size or simulation duration.

I’ve tried the following:

Explicitly setting OMP_NUM_THREADS=4 (and higher)

  • Verifying that libxsmm.dylib is correctly linked via DYLD_LIBRARY_PATH

  • Using both small and moderately sized meshes

  • Running simulations in isolation (only one PyFR instance active)

Despite this, CPU usage remains far below expected levels for a supposedly multi-threaded workload. When running multiple simulations, usage scales somewhat, but each individual run still uses very little CPU, and wall-clock time is long.

Questions:

  1. Is this behaviour expected on macOS with the OpenMP backend?

  2. Are there specific environment variables or backend limitations (e.g. with LIBXSMM, thread affinity, etc.) that may limit CPU usage on macOS or Apple Silicon?

  3. Would moving to Linux (e.g. on an x86_64 machine) improve OpenMP parallelism and performance?

  4. Are there any backend-specific optimisations (e.g. compiling LIBXSMM with special flags) you would recommend?

Any advice would be greatly appreciated. Thanks again for developing such a powerful tool.

Best regards,

Cassian

Performance should be reasonable with the OpenMP backend for simulations of modest size (such as the two 3D test cases which come with PyFR). Thread overhead is somewhat higher on macOS and Apple M series chips than Linux (ARM or x86), and so there can be a benefit to setting OMP_NUM_THREADS=1 and then partitioning the mesh (say into 4 parts) and then running with mpirun -np 4 …. I would only expect this to make a difference for small simulations, however.

Regards, Freddie.