How to run Multi-GPU per node node with PyFR

Dear All,

I just installed PyFR on our GPU cluster - what a breeze. I can run your examples and tutorials with CUDA backend and I can see them being offloaded to the GPU. Metis also appears to cooperate.

My question is how to configure multiple GPUs on a single node. Is there a way to make sure that say four processes will run on four different GPUs exclusively?

Also, would be possible to share a larger case as an example to actually flood the GPUs with work. I am more than willing to give PyFR a try but it will take a while for me to develop case setting skills.

Many thanks,
Robert

Hi Robert,

The easiest way is to put the GPUs into compute exclusive mode. This can be accomplished using the nvidia-smi tool and ensures that only one process can use a GPU at any given time.

Alternatively, you can set

[backend-cuda]
device-id = local-rank

which will assign the first MPI rank on the node to the first CUDA device, and so on and so forth. This approach does not require that the GPUs be in compute exclusive mode.

Regards, Freddie.

Hi,

Running separate ranks on different GPUs can be done in two ways:

  1. adding a section:
[backend-cuda]
device-id = local rank

see: http://www.pyfr.org/user_guide.php for more details

  1. running nvidia-smi -c 1 as root.
    This will enforce one compute process per card.

  2. To flood all GPUs/ increase the load, you can simply increase the polynomial order for the solution
    polynomial and/or add suitable anti-aliasing in any test case.

Regards
Arvind

Thanks Freddie and Arvin,

Somebody just made it too easy… Thanks. With the backend option I can run across nodes and with multiple GPUs per node.

My last question was a little cheeky I admit. I was curious if the cases from Vermeire, Witherden and Vincent (JCP 2017) paper are available. I can see the zip file among supplementary material but was unable to download it from Elsevier page even though the journal is open access. I’d like to use these cases mainly to test and benchmark two GPU clusters.

Robert

Hi Robert,

Thanks for your interest in PyFR!

My last question was a little cheeky I admit. I was curious if the cases from Vermeire, Witherden and Vincent (JCP 2017) paper are available. I can see the zip file among supplementary material but was unable to download it from Elsevier page even though the journal is open access. I’d like to use these cases mainly to test and benchmark two GPU clusters.

I was just going to suggest this. But as you say, it seems like their site is down. I’ll email Elsevier now. If its not back up tomorrow I’ll send over the files directly.

Cheers

Peter

Hi Robert,

I just heard back from the publisher and they were doing site maintenance earlier, which is why the files were unavailable. You should be able to download the supplementary material now, but let us know if still have any problems accessing them.

Cheers,

Thanks for letting me know. I downloaded them already and got them to produce
output on our cluster. I am focused on the isentropic vortex for now. There
were a few minor changes I had to do in ini such as fixing "tend" and
restructuring the output section.

Quick question. I am considering creating a github repository with the up-to
date versions for these examples for 1.5.0. Is that ok with you?

Best wishes,
Robert

Peter and Brian,

I am just forwarding this message as I thought it should have gone to
the list.

Thanks. I think you answered for now all my initial questions about PyFR. Also,
I finally sat down and compiled the SD7003 results from two GPU clusters to
which I have access and I thought I’ll share. Let me summarise a few things.

  • It seems to me that binding to sockets matters on IBM processors. Initially,
    I was binding everything to socket 1 and I was getting either freezing
    behaviour or no scaling.
  • With core and 2 per socket I seem to be getting stable behaviour and
    scaling up to 16 nodes.
  • I wasn’t able to run with cuda-aware switch despite compiling OpenMPI with
    CUDA. This is something I am still working on.

There still may be issues with clusters or software stack as both are in pretty
early days in terms of operation, but preliminary results look good. Let me
know what you think.

This is just a dry run of your code just to prove that it can work in
principle. I’m interested in looking into some details and pushing the
development.

This is the repository I created:

Best wishes,
Robert

sd7003-absolute.pdf (50.3 KB)

sd7003-efficiency.pdf (84.2 KB)

sd7003-speedup.pdf (50.2 KB)

Hi Robert - thanks for setting up the SD7003 example. I just tried a short MPI run (metis + Windows 7) on my 4 Kepler Titans. Runs fine, keeping all four GPUs steady at over 95% utilization.

To PyFR team: can you post the gmsh geometry file used for extruding the SD7003 profile? I’d like to try running a 2D version over that profile.

many thanks for making all this available,
Nigel

Hi Nigel,

No problem. I've really just redone things that PyFR developers put together
with their paper. It's really great when people do reproducible work-flows.
After this Wednesday I will be back to HPC work and will extend the benchmark
with Taylor-Green case and any updates for the new version of PyFR.

Also, I am interested to do some visualisation for the SD7003 case. Is there
any way I could get hold of the original geometry so that I can show for
instance Q-criterion + wing shape. The default vtu output only shows the
internal fluid field and I can't select components like BCs or surfaces.

Best wishes,
Robert

Hi Robert,

Please find attached a .vtu file of the SD7003 wall. You may need to rotate it in paraview depending on what angle of attack you are running.

Hi Nigel,

Please find attached a Gmsh .geo file for the SD7003 airfoil. Please note that this will not generate an identical mesh to the one used in the paper. Even an identical .geo file will generate a different mesh depending on your version of Gmsh, settings, etc. If you want to reproduce the paper results just use the .msh file provided in the supplementary material. However, if you want to run 2D simulations you should be able to get a case working from this .geo file.

Cheers,

sd7003.geo (10.1 KB)

wall.vtu (431 KB)

Hi Brian - thanks for sending the .geo file. What a great example for driving gmsh!
Those 2d nodes defining the wall profile are exactly what I needed.
with thanks - Nigel