Hello,
recently I’m having issues with PyFR hanging when running on many GPUs across nodes (240 GPUs and 30 nodes) whilst such a thing doesn’t happen with a lower node count. Cannot really understand where the hanging happens.
The plugins enabled are: turbulence, fluid force, nancheck, ascent
Any idea to troubleshoot the issue?
The PyFR version is 2.0.2 and the configuration file with the backend parameters is:
[backend]
precision = double
rank-allocator = linear
[backend-hip]
device-id = local-rank
mpi-type = hip-aware
Best regards,