Hi PyFR team,
I’m currently running PyFR using the openmp backend on macOS (Apple Silicon), and I’ve noticed that CPU utilisation remains very low — even when running a single simulation. I’m monitoring with both htop and Activity Monitor, and usage rarely exceeds 1–2 logical cores’ worth, regardless of problem size or simulation duration.
I’ve tried the following:
Explicitly setting OMP_NUM_THREADS=4 (and higher)
-
Verifying that libxsmm.dylib is correctly linked via DYLD_LIBRARY_PATH
-
Using both small and moderately sized meshes
-
Running simulations in isolation (only one PyFR instance active)
Despite this, CPU usage remains far below expected levels for a supposedly multi-threaded workload. When running multiple simulations, usage scales somewhat, but each individual run still uses very little CPU, and wall-clock time is long.
Questions:
-
Is this behaviour expected on macOS with the OpenMP backend?
-
Are there specific environment variables or backend limitations (e.g. with LIBXSMM, thread affinity, etc.) that may limit CPU usage on macOS or Apple Silicon?
-
Would moving to Linux (e.g. on an x86_64 machine) improve OpenMP parallelism and performance?
-
Are there any backend-specific optimisations (e.g. compiling LIBXSMM with special flags) you would recommend?
Any advice would be greatly appreciated. Thanks again for developing such a powerful tool.
Best regards,
Cassian