Indefinite initialization with large mesh case

j1woo · 3 January 2026 17:42

Hello,

I have been using PyFR on TACC Stampede3, and have been issues with running a model with a mesh size of approximately 2E8 elements. Namely, it seems that the model does not make any stepping progress and is stuck in the initialization phase indefinitely (+48 hrs). I have tried varying decomposition sizes without success up to 200 compute nodes. This is a first order case running on openmp backend.

All other smaller PyFR cases I have run in this environment/setup work fine so it seems to be a size induced issue. I would appreciate any suggestions/advice on addressing this issue and diagnosing if the start-up is just slow or stalling somewhere. Would the start-up optimizations on the performance tuning page improve this?

Thank you.

fdw · 4 January 2026 00:49

PyFR has recently been impacted by several bugs in NumPy (well, actually OpenBLAS) which can cause deadlocks on start-up. Can you confirm your NumPy version? Additionally, would you be able to attach a debugger to a stalled rank so we can get a backtrace.

A known workaround (assuming you can not switch to a more recent version of NumPy) is to set OMP_NUM_THREADS=1. This isn’t ideal if you’re running with the OpenMP backend as it means you will now need to switch to one rank per core.

Regards, Freddie.

j1woo · 4 January 2026 14:04

Freddie,

My NumPy version is 2.3.2. Could you confirm if this version is problematic and which version to switch to? I can work towards getting a debugger set up; in the meantime, I will submit a run with your workaround to see if it helps.

Thank you for your help.

fdw · 4 January 2026 14:29

From the bug report:

github.com/numpy/numpy

BUG: Binary Builds Deadlock due to OpenBLAS threading issue with fork

opened 10:56PM - 28 Oct 25 UTC

closed 10:24AM - 04 Nov 25 UTC

FreddieWitherden

00 - Bug

### Describe the issue: Consider the snippet shown below. On my system with `n…umpy` installed via pip this deadlocks with GDB showing an issue in the embedded OpenBLAS where it is stuck on a mutex. Rerunning with `export OMP_NUM_THREADS=1` resolves the issue. I suspect this is due to a recent change in how the binary distributions are prepared (maybe previously they were not using multithreaded OpenBLAS?) . The above snippet *should* be fine, however, as it is the parent which is continuing to use NumPy and its threads should not have changed. Maybe OpenBLAS has a bad atfork handler? If this is intended behaviour (NumPy not being compatible with any scripts which explicitly call fork) then it might be worth using `os.register_at_fork` to ensure an exception is thrown. ### Reproduce the code example: ```python import numpy as np import os # Do some algebra A = np.random.randn(216, 216) np.linalg.inv(A) # Fork but have the child do nothing if (pid := os.fork()) != 0: # Deadlock! np.linalg.inv(A) # Wait for the child os.waitpid(pid, 0) ``` ### Error message: ```shell ``` ### Python and NumPy Versions: ``` Build Dependencies: blas: detection method: pkgconfig found: true include directory: /opt/_internal/cpython-3.14.0/lib/python3.14/site-packages/scipy_openblas64/include lib directory: /opt/_internal/cpython-3.14.0/lib/python3.14/site-packages/scipy_openblas64/lib name: scipy-openblas openblas configuration: OpenBLAS 0.3.30 USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell MAX_THREADS=64 pc file directory: /project/.openblas version: 0.3.30 lapack: detection method: pkgconfig found: true include directory: /opt/_internal/cpython-3.14.0/lib/python3.14/site-packages/scipy_openblas64/include lib directory: /opt/_internal/cpython-3.14.0/lib/python3.14/site-packages/scipy_openblas64/lib name: scipy-openblas openblas configuration: OpenBLAS 0.3.30 USE64BITINT DYNAMIC_ARCH NO_AFFINITY Haswell MAX_THREADS=64 pc file directory: /project/.openblas version: 0.3.30 Compilers: c: commands: cc linker: ld.bfd name: gcc version: 14.2.1 c++: commands: c++ linker: ld.bfd name: gcc version: 14.2.1 cython: commands: cython linker: cython name: cython version: 3.1.4 Machine Information: build: cpu: x86_64 endian: little family: x86_64 system: linux host: cpu: x86_64 endian: little family: x86_64 system: linux Python Information: path: /tmp/build-env-qywnc8nt/bin/python version: '3.14' SIMD Extensions: baseline: - SSE - SSE2 - SSE3 found: - SSSE3 - SSE41 - POPCNT - SSE42 - AVX - F16C - FMA3 - AVX2 not found: - AVX512F - AVX512CD - AVX512_KNL - AVX512_KNM - AVX512_SKX - AVX512_CLX - AVX512_CNL - AVX512_ICL - AVX512_SPR ``` ### Runtime Environment: _No response_ ### Context for the issue: _No response_

This is a known bad version.

Regards, Freddie.

fdw · 10 January 2026 03:21

Just following up here to see if this resolved the issue?

Regards, Freddie.

j1woo · 10 January 2026 11:50

Freddie,

I apologize for the delay; I’ve been waiting for the job to run. Yes, I can confirm that changing out of a bad NumPy version (from 2.3.2 to 2.4.0) has resolved this issue. Additionally, in the past I have noticed occasional freezes upon start-up or at an intermediate solving time when running smaller separate jobs simultaneously. Could I attribute these freezes to this NumPy bug or is there anything else I have to take into account when running multiple PyFR cases simultaneously?

Thank you.

fdw · 10 January 2026 12:07

I would say it is highly likely. The only other candidate would be a deadlock due to async file writing (we’ve had some reports of issues although it seems to be due to misconfigured systems).

Note that there is a slight chance that another NumPy BLAS bug is lurking. When I get a reproducer I’ll report it to the NumPy team.

Regards, Freddie.

j1woo · 10 January 2026 19:26

Thank you for the help! I will keep an eye out for further issues/updates regarding NumPy.

boundaryconditions · 14 January 2026 17:39

Strange, for me it’s the other way around (Numpy 2.4.0 hangs but 2.3.2 from Intel distribution works).

Edit; I notice the GitHub issue linked above. If it is openblas related, then that might explain why the numpy from Intel worked (I assume it uses MKL).

fdw · 15 January 2026 01:05

Can you attach a debugger for a case where 2.4.0 hangs and then get a backtrace? This will let you open up an issue on the NumPy repository.

Yes, I do expect Intel to build against MKL which does not have the same threading issues as OpenBLAS.

Regards, Freddie.

j1woo · 16 January 2026 16:39

I have also been experiencing issues again (on 2.4.0) and am currently ruling out if these are system related stability issues. I will update this thread if I find more definitively that this is numpy related.

fdw · 18 January 2026 15:56

Did you have any luck catching NumPy hanging?

Regards, Freddie.

j1woo · 19 January 2026 20:58

So far, no. I have not observed any hanging from my moderately sized cases; the largest case I have, which experienced the most hanging, has yet to leave the system job queue. The system’s stability issues appear to have been resolved so the next hang I get will likely indicate a NumPy issue (on 2.4.0) and I will follow up with a debugger.

Thank you, Jay

Topic		Replies	Views
Compilation failure with openmp backend Just Starting	9	401	27 May 2017
cPickle has me in a pickle General	17	363	14 July 2015
Testing PyFR - 3d cylinder - openmp extremely slow General	8	350	24 June 2019
3D-Airfoil General	4	251	2 March 2020
PyFR Delta Wing Example Issue General	5	276	18 November 2014

Indefinite initialization with large mesh case

Related topics