Multi GPU calculation

Dear All,

I have a question about multi-GPU calculation.
My understading is that PyFR is able to calculate CFD with 2 Nvidia GPUs when we put “$mpiexec -n 2 pyfr run -b cuda -p” on Bash. But my task manager shows "GPU 0 " is only working.
I attach the figure.

Can you tell me why this is happening?

Best,
Yuji

Can you check if the mesh is correctly pre-partitioned for two procs (as described in the User-Guide)?

Yes I can.

Can you share your configuration file?

Regards, Freddie.

Thank you for replying.
My configuration file is

[backend]
precision = double

[backend-cuda]
device-id = round-robin
mpi-type = standard

[constants]
nu = 15e-6
Uin = 10.0
Vin = 0.0
Win = 0.0
Pc = 101.325e3
ac-zeta = 2.5

[solver]
system = ac-navier-stokes
order = 3

[solver-time-integrator]
formulation = dual
scheme = sdirk33
pseudo-scheme = rk45
controller = none
pseudo-controller = local-pi
tstart = 1.0
tend = 1.01
dt = 0.0001
pseudo-dt = 0.00001
pseudo-niters-min = 25
pseudo-niters-max = 50
pseudo-resid-norm = l2
pseudo-resid-tol = 5e-4
pseudo-resid-tol-p = 2.5e-2
atol = 1e-1
pseudo-dt-max-mult = 1.0


[solver-dual-time-integrator-multip]
pseudo-dt-fact = 1.75
cycle = [(3, 1), (2, 2), (1, 4), (0, 8), (1, 4), (2, 2), (3, 1)]
[solver-interfaces]
riemann-solver = rusanov
ldg-beta = 0.5
ldg-tau = 0.1

[solver-interfaces-line]
flux-pts = gauss-legendre

[solver-elements-tri]
soln-pts = williams-shunn



[solver-elements-tet]
soln-pts = shunn-ham

[solver-elements-quad]
soln-pts = gauss-legendre

[solver-interfaces-tri]
flux-pts = williams-shunn
quad-deg = 10
quad-pts = williams-shunn

[soln-plugin-nancheck]
nsteps = 11

[soln-plugin-pseudostats]
flushsteps = 1
file = ./restart_result/csv/residual_merge2.csv
header = true

[soln-plugin-writer]
dt-out = 0.001
basedir = ./restart_result
basename = target_with_fluid_merge2-{t:.2f}

[soln-plugin-integrate]
nsteps = 1
file = ./restart_result/csv/integral_merge2.csv
header = true
quad-deg = 9
div1 = grad_u_x
div2 = grad_v_y
div3 = grad_w_z
int-dive = abs(%(div1)s + %(div2)s + %(div3)s)



[soln-bcs-wall]
type = no-slp-wall

[soln-bcs-outlet]
type = ac-out-fp
p = Pc

[soln-bcs-inlet]
type = ac-in-fv
u = Uin
v = Vin
w = Win


[soln-ics]
u = Uin
v = Vin
w = Win
p = Pc

Moreover I cheked nvidia-smi, but GPU 1 is not working.

I replaced [backend-cuda]
Nvidia-smi is working well

[backend]
precision = double
rank-allocator = linear

[backend-cuda]
device-id = local-rank
mpi-type = standard