I’m trying to run the Couette flow example using the CUDA backend via the following command:
$ pyfr run -b cuda -p couette_flow_2d.pyfrm couette_flow_2d.ini
But I get the following error:
File “/home/cades/miniconda3/lib/python3.8/site-packages/pyfr/backends/cuda/cublas.py”, line 78, in _errcheck
raise self._statuses[status]
pyfr.backends.cuda.cublas.CUBLASNotInitialized
Any suggestions on how to fix this?
I ran some of the PyCuda examples at https://github.com/inducer/pycuda/tree/master/examples without any problems.
I used Numba to get some system info (see below). Looks like the cublas library is working fine.
System info:
fdw
26 September 2020 12:16
5
Hi Gavin,
Looking at the output below it appears as if Numba is opening up and loading the CUBLAS shared library, although it does not state if it is actually calling any of the methods. Would you be able to run a test case in Numba which calls down to CUBLAS?
Regards, Freddie.
I have never used Numba (other than the above) so I don’t have a readily available example to use.
I created the following example using CuPy which uses Cuda to calculate the norm of a matrix. I think this uses cublas. This example works fine.
Compare CuPy and NumPy
import cupy as cp
import numpy as np
import time
n = 100_000_000
ns = 10_000
Using NumPy
x_cpu = np.arange(n) - 4
x_cpu.reshape((ns, ns))
ti = time.perf_counter()
norm_cpu = np.linalg.norm(x_cpu)
tf = time.perf_counter()
print(f’NumPy result: \t{norm_cpu}’)
print(f’NumPy time: \t{tf - ti:.4g}’)
Using CuPy
x_gpu = cp.arange(n) - 4
x_gpu.reshape((ns, ns))
ti = time.perf_counter()
norm_gpu = cp.linalg.norm(x_gpu)
tf = time.perf_counter()
print(f’CuPy result: \t{norm_gpu}’)
print(f’CuPy time: \t{tf - ti:.4g}’)
fdw
26 September 2020 16:10
8
Hi Gavin,
So it looks as if the cupy.linalg.norm function makes does not call out to CUBLAS. See:
https://github.com/cupy/cupy/blob/master/cupy/linalg/norms.py#L54
Regards, Freddie.
Here’s another example that uses the CuPy solve function which does use cublas. At least from what I can tell in the CuPy source code. This example works fine too.
Compare CuPy and NumPy
import cupy as cp
import numpy as np
import time
Using NumPy
a_cpu = np.array([[4, 3, 2], [-2, 2, 3], [3, -5, 2]])
b_cpu = np.array([25, -10, -4])
ti = time.perf_counter()
x_cpu = np.linalg.solve(a_cpu, b_cpu)
tf = time.perf_counter()
print(f’NumPy result: \t{x_cpu}’)
print(f’NumPy time: \t{tf - ti:.4g}’)
Using CuPy
a_gpu = cp.array([[4, 3, 2], [-2, 2, 3], [3, -5, 2]])
b_gpu = cp.array([25, -10, -4])
ti = time.perf_counter()
x_gpu = cp.linalg.solve(a_gpu, b_gpu)
tf = time.perf_counter()
print(f’CuPy result: \t{x_gpu}’)
print(f’CuPy time: \t{tf - ti:.4g}’)
fdw
26 September 2020 17:50
10
Hi Gavin,
So I believe that here CuPy is using cusolver as opposed to CUBLAS to
solve the system.
Would you be able to run the following snippet for me?
from ctypes import CDLL, POINTER, c_void_p
lib = CDLL('libcublas.so')
create = lib.cublasCreate_v2
create.argtypes = [POINTER(c_void_p)]
handle = c_void_p()
print(create(handle))
This should print 0.
Regards, Freddie.
Freddie, I ran your script and it prints 1.
In my bashrc file, I’m pointing to the CUDA environment as follows:
export PATH="$PATH:/usr/local/cuda-9.2/bin"
export LD_LIBRARY_PATH="/usr/local/cuda-9.2/lib64"
The libcublas.so is located in /usr/local/cuda-9.2/lib64/ as:
lrwxrwxrwx 1 root root 16 Sep 25 19:45 libcublas.so -> libcublas.so.9.2*
lrwxrwxrwx 1 root root 19 Sep 25 19:45 libcublas.so.9.2 -> libcublas.so.9.2.88*
fdw
26 September 2020 19:04
13
Hi Gavin,
It appears as if either your CUDA install or environment is broken. It
could be the version of CUBLAS the loader is finding is incompatible
with your main CUDA version.
Regards, Freddie.
So I wiped the system I was working on and did a fresh install of CUDA. Everything seems to be working fine now.