PyFR hanging while running cases

Hello,

recently I’m having issues with PyFR hanging when running on many GPUs across nodes (240 GPUs and 30 nodes) whilst such a thing doesn’t happen with a lower node count. Cannot really understand where the hanging happens.

The plugins enabled are: turbulence, fluid force, nancheck, ascent

Any idea to troubleshoot the issue?

The PyFR version is 2.0.2 and the configuration file with the backend parameters is:

[backend]
precision = double
rank-allocator = linear

[backend-hip]
device-id = local-rank
mpi-type = hip-aware

Best regards,

Does it hang if you disable HIP-aware MPI?

Regards, Freddie.

I’ll let you know.

Best regards