Error: CUDAOutofMemory on A30

Hello, thank you very much for your reply.
I have tried mpiexec -n 6 pyfr run -b cuda -p x.pyfrm x.ini by splitting the mesh first, but it gives an error raise

RuntimeError(f'Mesh has {nparts} partitions but running '
RuntimeError: Mesh has 6 partitions but running with 1 MPI ranks

I tried mpirun -n 6 with the same error, i think it’s a mpi4py error because i initially installed mpi4py via conda, but i got no clue about installing it via pip (cf. this article,). Finally I have a couple of questions I would like to ask you, thank you for your advice.
Question 1: I think for gpu computing, one cpu core is enough, more cpu will add extra information communication, mpiexec -n 6 or mpirun are specifying 6 cpu cores, does it mean 6 gpu cards must have at least 6 cpu cores to use?
Question 2: I tested mpi4yr with this code and didn’t get the expected results, so I decided it was an installation problem with mpi4py.
from mpi4py import MPI

if name == “main”.
comm = MPI.COMM_WORLD
print(f “rank {comm.rank} of {comm.size}”)
comm.barrier()

rank 0 of 1
rank 0 of 1
rank 0 of 1

I tried disabling the mpi environment in the cuda suite and installing it directly with the default mpi.

  nvc-Error-Unknown switch: -Wno-unused-result
  nvc-Error-Unknown switch: -fwrapv
  warning: build_ext: command '/opt/nvidia/hpc_sdk/Linux_x86_64/22.11/compilers/bin/nvc' failed with exit code 1
  warning: build_ext: building optional extension "mpi4py.dl" failed
  checking for MPI compile and link ...

Unfortunately I couldn’t find the mpi4py.dl file

find / -name mpi4py.dl
find: ‘/run/user/1001/doc’: Permission denied
find: ‘/run/user/1001/gvfs’: Permission denied

It’s worth noting that I’m using a single A30 GPU to compute the small memory example with no problem, just as I did with the A100.
Regards, wgbb