GPU parallelization error, ranks not equal to gpu number

When I run 2d euler vortex case, i got this message.

Last month, I asked same question. That day I thought this message was because I have 1 gpu. Now, i install one more gpu (2 * rtx 3090 ti gpus) It still shows the same error message.

I checked that two gpu were installed.
case 1 :
device-id = 0
case2 :
device-id = 1
Through the above command, it was confirmed that the two gpu were calculated respectively.

However, parallel calculations are not performed through the local-rank command. What is this error message associated with, and how can it be resolved?

Have a look at this in the documebntation:User Guide — Documentation

I think you want to be using device-id = local-rank

I’ve already used local-rank. (2 deuler vortex, my other example file)
All of them had the same error as the first picture.

Did try the solutions from this post when you had a similar problem?

i used mpiexec -n 2 pyfr … command and i reinstall mpi4py.
mpi4py version is 3.1.3. Is the version of mpi4py also related to the problem?

So when you run:

$ pyfr partition 2 euler_vortex_2d.pyfrm .
$ mpiexec -n 2 pyfr run -b cuda -p euler_vortex_2d.pyfrm euler_vortex_2b.ini

with the only backend configuration in the ini file being:

precision = single
rank-allocator = linear

device-id = local-rank

Do you see this problem?

I tried other things as well as what you showed me.
adding “mpi-type=cuda-aware” or “mpi-type=standard”

or [backend] include only precision (except rank-allocator)

The error seems to be related to your mpi set up. When you do mpiexec -n 2 it starts two instances of pyfr with 1 rank rather than one instance with two ranks.

When you reinstalled mpi4py did you do pip install --forece-reinstall mpi4py h5py?

The error message is very clear; you are only running with a single MPI rank whereas you need to be running with two.

While you may think you are running with two ranks due to mpiexec -n 2 pyfr ... this is likely because mpiexec belongs to a different MPI library to that which mpi4py was compiled against. For example, if mpiexec comes from MPICH but mpi4py was compiled against OpenMPI. You can not mix and match MPI runtimes: pick one and stick with it. Before running PyFR it is also suggested to try some of the mpi4py example programs.

Regards, Freddie.

Actually, I think this is a common issue for those who are new to PyFR; I faced this annoying problem too. Unfortunately, I didn’t quite catch your explanation.
Specifically, when I run mpiexec -n 2, it seems to launch two separate instances of PyFR with one rank each, rather than one instance with two ranks.

Initially, I tried to configure OpenMP but I’m not sure if that worked or even mattered, as my primary interest lies in utilizing the CUDA backend to run on multiple GPUs. It appears that OpenMP settings aren’t critical in this case?

I’m also uncertain whether the mpiexec -n 2 command targets two GPUs or two CPUs, as the official website doesn’t offer much clarification on this.

As for reinstalling mpi4py, I’m in the dark there too. I originally installed mpi4py using conda install, whereas PyFR was installed via pip. My setup works fine when running on a single partition with one GPU. I attempted pip install --force-reinstall mpi4py h5py but ended up with an error message saying “Cannot link MPI programs. Check your configuration.”

I’m rather frustrated as I’ve been essentially flying blind without enough guidance on the underlying principles. It feels like I’m taking shots in the dark and, unfortunately, my luck hasn’t been great.

Regardless, I appreciate all the contributions everyone here has made to the community.
Best Regards.

I think you may have misunderstood what Freddie is saying.

He means that mpi4py was compiled against one mpi library, but at run time is linked against another. For example, this can happen if mpi was installed using a package manager and it updated the mpi version it or if you changed your environment but you didn’t tie the mpi4py package to the specific mpi library in that environment e.g. using Lmod.

This is very much an mpi4py problem. Please verify your mpi4py installation first before attempting to run PyFR. There are several examples which come with mpi4py you can use.

Regards, Freddie.

Thanks for your quick response.

Could you guide me on where to find example cases if I install mpi4py through conda? I appreciate any advice you can offer.

Also, I’m a bit confused about that it seems like MPI is more geared towards managing CPU parallelism rather than GPU parallelism, but I’d like all my computations to be carried out using the CUDA backend.
When I run PyFR on a computing cluster, should the number of grid partitions be equal to the number of GPUs being utilized? I’m confused about how my GPU parallel computing is related to CPU parallelism. Could you help me understand this?
I’m also unsure about how to properly configure CPU resources on the cluster.

Thanks again for your guidance!
Best regards.

Have a look through the tutorials on the mpi4py docs; there are lots you can try out: Tutorial — MPI for Python 3.1.4 documentation.

In PyFR, we suggest one MPI rank per GPU for good performance.

Ultimately you can’t only use the GPU; they are designed to execute lots of simulateous compute tasks. They couldn’t for example be used to read in the mesh, that has to be handled by the CPU. Put simply, in PyFR the CPU runs the Python side, and the GPU, handles tasks which are offloaded to it (mainly matmuls in PyFR).

Each MPI rank handles a partition and at runtime the vague steps are: read that part of the mesh, allocate memory on the GPU, and then launch the required kernels. The plugins thought are generally executed on the CPU.

If you partitioned your mesh with two partitions and are running on the cuda backend, then to launch PyFR this command should be enough:

$ mpirun -n 2 pyfr run -b cuda ...

However there are many additional confiurations you can give to MPI for nuanced control on more complicated hardware. For example, on Frontier. Their documentation does an excellent job of explaining this, so I won’t try and repeat it. Frontier User Guide — OLCF User Documentation

But as an example this is a snippet from a submission script I use on a system similar to Summit (ie 2x Power9 and 6x V100 per node)

options="restart -b cuda mesh.pyfrm soln.pyfrs cfg.ini"
ucx_config="-mca pml ucx --mca osc ucx -x UCX_MAX_RNDV_RAILS=1"
mpirun -map-by ppr:3:socket -rank-by core -bind-to core -N 6 -n $SLURM_NTASKS ${ucx_config} ${application} ${options}

Your explanation inspired me.
But it seems that my command mpirun -n 2 pyfr run -b cuda … calls openmpi program instead of mpi4py. Any suggestions?

Yep, it should be calling your MPI library. Within Python the mpi4py package then handles the communication between the processes launched this way.

Maybe take a look at the mpi4py docs and this repo of examples for more information: GitHub - jbornschein/mpi4py-examples: mpi4py examples

Ok, I will take the time to study hard.
Let’s see, I want to use mpiexec of mpi4py instead of openmpi, so I delete the environment variables of openmpi before executing mpirun -n 2 pyfr run -b cuda …
The reason why I do this is because I want to use the mpi of mpi4py, but the result is that I can’t find the mpiexec command when reporting an error. Did I make some basic mistakes?

In fact, the root cause of the problem is that when I use mpiexec -n 2, I will report the error Mesh has two partitions but running with 1 MPI rank.
I didn’t find a solution through the content of this post.
The big probability is that there is something wrong with my mpi4py installation??? I installed it through conda install mpi4py.
I’m sorry that I really don’t know enough about openmpi and mpi4py.
Best regards.

mpi4py is a python wrapper for the functionality implemented in an MPI library ie openMPI, MPICH, spectrum…

In order to run PyFR across multiple ranks, you have to use the launch application with mpiexec (some libraries also have mpirun but this is not part of the standard), this is the same for any application you wish to use MPI with.

This will launch n instances of PyFR and give each a rank ID number. Then, inside pyfr we use the mpi4py package to initialise MPI, detect the rank id, and handle the communication required to run the calculation. For example, send/receive packets of data between ranks.

mpi4py is not independent of the MPI library