CUBLASInternalError when using multi GPU

hwtang · 7 November 2022 15:36

Dear all,

I got an error when trying to use 2 different GPUs.

The command I used is mpiexec -n 2 pyfr restart -b cuda -p solutions_01/sd7003.pyfrm solutions_01/sd7003_145.00.pyfrs sd7003_01.ini.

Traceback (most recent call last):
  File "/home/tang/anaconda3/envs/pyfr1.15.0/bin/pyfr", line 33, in <module>
    sys.exit(load_entry_point('pyfr==1.15.0', 'console_scripts', 'pyfr')())
  File "/home/tang/anaconda3/envs/pyfr1.15.0/lib/python3.10/site-packages/pyfr-1.15.0-py3.10.egg/pyfr/__main__.py", line 118, in main
  File "/home/tang/anaconda3/envs/pyfr1.15.0/lib/python3.10/site-packages/pyfr-1.15.0-py3.10.egg/pyfr/__main__.py", line 270, in process_restart
  File "/home/tang/anaconda3/envs/pyfr1.15.0/lib/python3.10/site-packages/pyfr-1.15.0-py3.10.egg/pyfr/__main__.py", line 247, in _process_common
  File "/home/tang/anaconda3/envs/pyfr1.15.0/lib/python3.10/site-packages/pyfr-1.15.0-py3.10.egg/pyfr/integrators/base.py", line 115, in run
  File "/home/tang/anaconda3/envs/pyfr1.15.0/lib/python3.10/site-packages/pyfr-1.15.0-py3.10.egg/pyfr/integrators/std/controllers.py", line 181, in advance_to
  File "/home/tang/anaconda3/envs/pyfr1.15.0/lib/python3.10/site-packages/pyfr-1.15.0-py3.10.egg/pyfr/integrators/std/steppers.py", line 190, in step
  File "/home/tang/anaconda3/envs/pyfr1.15.0/lib/python3.10/site-packages/pyfr-1.15.0-py3.10.egg/pyfr/solvers/base/system.py", line 265, in rhs
  File "/home/tang/anaconda3/envs/pyfr1.15.0/lib/python3.10/site-packages/pyfr-1.15.0-py3.10.egg/pyfr/util.py", line 40, in newmeth
  File "/home/tang/anaconda3/envs/pyfr1.15.0/lib/python3.10/site-packages/pyfr-1.15.0-py3.10.egg/pyfr/solvers/baseadvecdiff/system.py", line 82, in _rhs_graphs
  File "/home/tang/anaconda3/envs/pyfr1.15.0/lib/python3.10/site-packages/pyfr-1.15.0-py3.10.egg/pyfr/backends/base/types.py", line 332, in add
  File "/home/tang/anaconda3/envs/pyfr1.15.0/lib/python3.10/site-packages/pyfr-1.15.0-py3.10.egg/pyfr/backends/cuda/provider.py", line 35, in add_to_graph
  File "/home/tang/anaconda3/envs/pyfr1.15.0/lib/python3.10/site-packages/pyfr-1.15.0-py3.10.egg/pyfr/backends/cuda/provider.py", line 35, in <listcomp>
  File "/home/tang/anaconda3/envs/pyfr1.15.0/lib/python3.10/site-packages/pyfr-1.15.0-py3.10.egg/pyfr/backends/cuda/cublas.py", line 103, in add_to_graph
  File "/home/tang/anaconda3/envs/pyfr1.15.0/lib/python3.10/site-packages/pyfr-1.15.0-py3.10.egg/pyfr/backends/cuda/cublas.py", line 111, in run
  File "/home/tang/anaconda3/envs/pyfr1.15.0/lib/python3.10/site-packages/pyfr-1.15.0-py3.10.egg/pyfr/ctypesutil.py", line 33, in _errcheck
pyfr.backends.cuda.cublas.CUBLASInternalError
application called MPI_Abort(MPI_COMM_WORLD, 1) - process 1

Here are the details of my devices:

Summary

The configure file I used:

Summary

[backend]
precision = single

[backend-cuda]
device-id = local-rank
gimmik-max-nnz = 1024
mpi-type = standard

[constants]
gamma = 1.4
mu = 3.94405318873308E-6
Pr = 0.72
M = 0.2

[solver-time-integrator]
scheme = rk45
controller = pi
tstart = 0.0
dt = 0.00001
atol = 0.000001
rtol = 0.000001
; safety-fact = 0.5
; min-fact = 0.3
; max-fact = 1.2
tend = 185.0

[soln-plugin-nancheck]
nsteps = 50

[soln-plugin-writer]
dt-out = 5.0
basedir = ./solutions_01/
basename = sd7003_{t:.2f}

[soln-plugin-fluidforce-wall]
nsteps = 5
file = sd7003_01-wall-forces.csv
header = true

[solver]
system = navier-stokes
order = 4
anti-alias = flux

[solver-interfaces]
riemann-solver = rusanov
ldg-beta = 0.5
ldg-tau = 0.1

[solver-interfaces-line]
flux-pts = gauss-legendre
quad-deg = 11
quad-pts = gauss-legendre

[solver-interfaces-quad]
flux-pts = gauss-legendre
quad-deg = 11
quad-pts = gauss-legendre

[solver-elements-hex]
soln-pts = gauss-legendre
quad-deg = 11
quad-pts = gauss-legendre

[soln-bcs-outlet]
type = char-riem-inv
rho = 1.0
u = 0.2366431913
v = 0.0
w = 0.0
p = 1.0

[soln-bcs-inlet]
type = char-riem-inv
rho = 1.0
u = 0.2366431913
v = 0.0
w = 0.0
p = 1.0

[soln-bcs-wall]
type = no-slp-adia-wall
cpTw = 3.5

[soln-ics]
rho = 1.0
u = 0.2366431913
v = 0.001
w = 0.001*cos(x)*cos(y)
p = 1.0

I have tested with the 2D Euler Vortex example with 2 GPUs, and things are fine. And single GPU is also OK with the same configuration file. So I do not figure out what I did wrong. Could you give me some suggestions, please?

OS: ubuntu 20.04
PyFR version: 1.15.0

fdw · 7 November 2022 18:07

If PyFR works on both GPUs individually the first thing I would check is if you might be running out of memory on one of the GPUs. This not not raise an internal error, but it is something we want to rule out.

Regards, Freddie.

hwtang · 8 November 2022 01:46

Hi Freddie,

Thank you for your reply. I checked the simulation on every single GPU and found the error was caused by GPU 0 (Tesla K40c). The simulation can run successfully on another GPU 1. The simulation takes 9093 MB memory on GPU 1. So neither of the GPUs is running out of memory.

I also did a test using PyFR 1.12.3. Surprisingly, the simulation can run successfully using two GPUs. And it is OK using either GPU.

The simulation is the sd7003 case obtained from this paper.

BTW, I have a novice question:
if one single GPU has enough memory to do a simulation, can the simulation be accelerated with multi GPUs? The answer seems to be yes if I do not misunderstand these discussions. And could you also recommend some GPU programming materials for beginners, please?

Best regards.

WillT · 8 November 2022 09:21

I imagine waht the issue is is that the graph API stuff that has ben added is not support on K40 GPUs, which are now quite old. I know for example that the newer nvidia profilers don’t support these.

In answer to your second question, generally stong scaling will always speed up your calculation, unless you strong scale it far too far. Each GPU needs enough to do to occupy the warps sufficiently, and for the overhead of launching kernels to not be the major time use, aswell as to fill the time when communication is happening.

NVIDIA’s blogs are a good source of material as this paper https://arxiv.org/abs/1804.06826, but as Volta is a couple generation old now it is less relivant.

hwtang · 8 November 2022 09:30

OK, I get it. Thank you very much.

Topic		Replies	Views
Regarding PyFR on Multi GPU Just Starting	2	206	11 January 2021
Multi GPU calculation Cases hpc	5	342	10 April 2023
GPU out of memory? Just Starting	3	269	29 July 2016
Memory on CUDA GPU backend General	14	1072	18 November 2021
Runtime error about 2d euler vortex example General	3	428	12 July 2022

CUBLASInternalError when using multi GPU

Related topics