GPU parallelization error, ranks not equal to gpu number


When I run 2d euler vortex case, i got this message.

Last month, I asked same question. That day I thought this message was because I have 1 gpu. Now, i install one more gpu (2 * rtx 3090 ti gpus) It still shows the same error message.


I checked that two gpu were installed.
case 1 :
[backend-cuda]
device-id = 0
case2 :
[backend-cuda]
device-id = 1
Through the above command, it was confirmed that the two gpu were calculated respectively.

However, parallel calculations are not performed through the local-rank command. What is this error message associated with, and how can it be resolved?

Have a look at this in the documebntation:User Guide — Documentation

I think you want to be using device-id = local-rank

I’ve already used local-rank. (2 deuler vortex, my other example file)
All of them had the same error as the first picture.

Did try the solutions from this post when you had a similar problem?

yes
i used mpiexec -n 2 pyfr … command and i reinstall mpi4py.
mpi4py version is 3.1.3. Is the version of mpi4py also related to the problem?

So when you run:

$ pyfr partition 2 euler_vortex_2d.pyfrm .
$ mpiexec -n 2 pyfr run -b cuda -p euler_vortex_2d.pyfrm euler_vortex_2b.ini

with the only backend configuration in the ini file being:

[backend]
precision = single
rank-allocator = linear

[cuda-backend]
device-id = local-rank

Do you see this problem?

I tried other things as well as what you showed me.
adding “mpi-type=cuda-aware” or “mpi-type=standard”

or [backend] include only precision (except rank-allocator)

The error seems to be related to your mpi set up. When you do mpiexec -n 2 it starts two instances of pyfr with 1 rank rather than one instance with two ranks.

When you reinstalled mpi4py did you do pip install --forece-reinstall mpi4py h5py?

The error message is very clear; you are only running with a single MPI rank whereas you need to be running with two.

While you may think you are running with two ranks due to mpiexec -n 2 pyfr ... this is likely because mpiexec belongs to a different MPI library to that which mpi4py was compiled against. For example, if mpiexec comes from MPICH but mpi4py was compiled against OpenMPI. You can not mix and match MPI runtimes: pick one and stick with it. Before running PyFR it is also suggested to try some of the mpi4py example programs.

Regards, Freddie.