I have read through similar discussions on this, and I have seen to change device-id to local-rank for [backend-cuda] and to change GPUs to “compute exclusive mode”. However, I am running PyFR on supercomputer HPC cluster and do not have any sort of sudo access to change to compute exclusive mode. If someone could help me with a general idea as to how to run PyFR on multiple GPUs on supercomputer cluster it would be greatly appreciated. Maybe the only way is to submit a batch job rather than an interactive desktop. Perhaps modifying the beginning of my job script for 1 GPU would do the trick?:
Thanks for the quick reply! I am in need of more GPUs, as I am running a large grid with only 1 GPU and getting a CUDA out of memory error. When I run it as a batch job and request multiple GPUs, do I set device-id = local-rank for [backend-cuda]?
Also, when partitioning, should I be partitioning the grid in the number of GPUs that I have or the number of cores? i.e. if I have 4 GPUs and 48 cores, would I partition my mesh in 48 or 4? I had always assumed to do number of cores and that is why I have ntasks-per-node set to 48.
Thank you! Also, one quick unrelated question. When I was running some of PyFR’s test cases, I would sometimes get a nice progress bar that appeared below GPUFreq = control_disabled that updated the sim’s progress and ETA. However, most of the time I do not see it. Is there a way to have this always show up?
Thanks, that worked. Also, when running with multiple GPUs like we discussed earlier, I am seeing that only 1 of my GPUs is being used with local-rank. Here is what NVIDIA-smi shows me during one of the runs:
Yes, that is what I have it set at as I mentioned. Any other advice as how I can get it to run with all 4 GPUs? I am on a supercomputer cluster, so I do not have any sort of sudo access or anything. Thank you!
What system are you trying to run on? I would try setting --gpus-per-task=1. This should give each rank a GPU, which it sees as device 0. And then set device-id=0 in the .ini file.