Dear all,
I got an error when trying to use 2 different GPUs.
The command I used is mpiexec -n 2 pyfr restart -b cuda -p solutions_01/sd7003.pyfrm solutions_01/sd7003_145.00.pyfrs sd7003_01.ini
.
Traceback (most recent call last):
File "/home/tang/anaconda3/envs/pyfr1.15.0/bin/pyfr", line 33, in <module>
sys.exit(load_entry_point('pyfr==1.15.0', 'console_scripts', 'pyfr')())
File "/home/tang/anaconda3/envs/pyfr1.15.0/lib/python3.10/site-packages/pyfr-1.15.0-py3.10.egg/pyfr/__main__.py", line 118, in main
File "/home/tang/anaconda3/envs/pyfr1.15.0/lib/python3.10/site-packages/pyfr-1.15.0-py3.10.egg/pyfr/__main__.py", line 270, in process_restart
File "/home/tang/anaconda3/envs/pyfr1.15.0/lib/python3.10/site-packages/pyfr-1.15.0-py3.10.egg/pyfr/__main__.py", line 247, in _process_common
File "/home/tang/anaconda3/envs/pyfr1.15.0/lib/python3.10/site-packages/pyfr-1.15.0-py3.10.egg/pyfr/integrators/base.py", line 115, in run
File "/home/tang/anaconda3/envs/pyfr1.15.0/lib/python3.10/site-packages/pyfr-1.15.0-py3.10.egg/pyfr/integrators/std/controllers.py", line 181, in advance_to
File "/home/tang/anaconda3/envs/pyfr1.15.0/lib/python3.10/site-packages/pyfr-1.15.0-py3.10.egg/pyfr/integrators/std/steppers.py", line 190, in step
File "/home/tang/anaconda3/envs/pyfr1.15.0/lib/python3.10/site-packages/pyfr-1.15.0-py3.10.egg/pyfr/solvers/base/system.py", line 265, in rhs
File "/home/tang/anaconda3/envs/pyfr1.15.0/lib/python3.10/site-packages/pyfr-1.15.0-py3.10.egg/pyfr/util.py", line 40, in newmeth
File "/home/tang/anaconda3/envs/pyfr1.15.0/lib/python3.10/site-packages/pyfr-1.15.0-py3.10.egg/pyfr/solvers/baseadvecdiff/system.py", line 82, in _rhs_graphs
File "/home/tang/anaconda3/envs/pyfr1.15.0/lib/python3.10/site-packages/pyfr-1.15.0-py3.10.egg/pyfr/backends/base/types.py", line 332, in add
File "/home/tang/anaconda3/envs/pyfr1.15.0/lib/python3.10/site-packages/pyfr-1.15.0-py3.10.egg/pyfr/backends/cuda/provider.py", line 35, in add_to_graph
File "/home/tang/anaconda3/envs/pyfr1.15.0/lib/python3.10/site-packages/pyfr-1.15.0-py3.10.egg/pyfr/backends/cuda/provider.py", line 35, in <listcomp>
File "/home/tang/anaconda3/envs/pyfr1.15.0/lib/python3.10/site-packages/pyfr-1.15.0-py3.10.egg/pyfr/backends/cuda/cublas.py", line 103, in add_to_graph
File "/home/tang/anaconda3/envs/pyfr1.15.0/lib/python3.10/site-packages/pyfr-1.15.0-py3.10.egg/pyfr/backends/cuda/cublas.py", line 111, in run
File "/home/tang/anaconda3/envs/pyfr1.15.0/lib/python3.10/site-packages/pyfr-1.15.0-py3.10.egg/pyfr/ctypesutil.py", line 33, in _errcheck
pyfr.backends.cuda.cublas.CUBLASInternalError
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1
Here are the details of my devices:
The configure file I used:
Summary
[backend]
precision = single
[backend-cuda]
device-id = local-rank
gimmik-max-nnz = 1024
mpi-type = standard
[constants]
gamma = 1.4
mu = 3.94405318873308E-6
Pr = 0.72
M = 0.2
[solver-time-integrator]
scheme = rk45
controller = pi
tstart = 0.0
dt = 0.00001
atol = 0.000001
rtol = 0.000001
; safety-fact = 0.5
; min-fact = 0.3
; max-fact = 1.2
tend = 185.0
[soln-plugin-nancheck]
nsteps = 50
[soln-plugin-writer]
dt-out = 5.0
basedir = ./solutions_01/
basename = sd7003_{t:.2f}
[soln-plugin-fluidforce-wall]
nsteps = 5
file = sd7003_01-wall-forces.csv
header = true
[solver]
system = navier-stokes
order = 4
anti-alias = flux
[solver-interfaces]
riemann-solver = rusanov
ldg-beta = 0.5
ldg-tau = 0.1
[solver-interfaces-line]
flux-pts = gauss-legendre
quad-deg = 11
quad-pts = gauss-legendre
[solver-interfaces-quad]
flux-pts = gauss-legendre
quad-deg = 11
quad-pts = gauss-legendre
[solver-elements-hex]
soln-pts = gauss-legendre
quad-deg = 11
quad-pts = gauss-legendre
[soln-bcs-outlet]
type = char-riem-inv
rho = 1.0
u = 0.2366431913
v = 0.0
w = 0.0
p = 1.0
[soln-bcs-inlet]
type = char-riem-inv
rho = 1.0
u = 0.2366431913
v = 0.0
w = 0.0
p = 1.0
[soln-bcs-wall]
type = no-slp-adia-wall
cpTw = 3.5
[soln-ics]
rho = 1.0
u = 0.2366431913
v = 0.001
w = 0.001*cos(x)*cos(y)
p = 1.0
I have tested with the 2D Euler Vortex example with 2 GPUs, and things are fine. And single GPU is also OK with the same configuration file. So I do not figure out what I did wrong. Could you give me some suggestions, please?
OS: ubuntu 20.04
PyFR version: 1.15.0