CUBLASNotInitialized error

I’m trying to run the Couette flow example using the CUDA backend via the following command:

$ pyfr run -b cuda -p couette_flow_2d.pyfrm couette_flow_2d.ini

But I get the following error:

File “/home/cades/miniconda3/lib/python3.8/site-packages/pyfr/backends/cuda/cublas.py”, line 78, in _errcheck

raise self._statuses[status]

pyfr.backends.cuda.cublas.CUBLASNotInitialized

Any suggestions on how to fix this?

Hi Gavin,

I ran some of the PyCuda examples at https://github.com/inducer/pycuda/tree/master/examples without any problems.

I used Numba to get some system info (see below). Looks like the cublas library is working fine.

System info:

Hi Gavin,

Looking at the output below it appears as if Numba is opening up and loading the CUBLAS shared library, although it does not state if it is actually calling any of the methods. Would you be able to run a test case in Numba which calls down to CUBLAS?

Regards, Freddie.

I have never used Numba (other than the above) so I don’t have a readily available example to use.

I created the following example using CuPy which uses Cuda to calculate the norm of a matrix. I think this uses cublas. This example works fine.

Compare CuPy and NumPy

import cupy as cp
import numpy as np
import time

n = 100_000_000
ns = 10_000

Using NumPy

x_cpu = np.arange(n) - 4
x_cpu.reshape((ns, ns))

ti = time.perf_counter()
norm_cpu = np.linalg.norm(x_cpu)
tf = time.perf_counter()

print(f’NumPy result: \t{norm_cpu}’)
print(f’NumPy time: \t{tf - ti:.4g}’)

Using CuPy

x_gpu = cp.arange(n) - 4
x_gpu.reshape((ns, ns))

ti = time.perf_counter()
norm_gpu = cp.linalg.norm(x_gpu)
tf = time.perf_counter()

print(f’CuPy result: \t{norm_gpu}’)
print(f’CuPy time: \t{tf - ti:.4g}’)

Hi Gavin,

So it looks as if the cupy.linalg.norm function makes does not call out to CUBLAS. See:

https://github.com/cupy/cupy/blob/master/cupy/linalg/norms.py#L54

Regards, Freddie.

Here’s another example that uses the CuPy solve function which does use cublas. At least from what I can tell in the CuPy source code. This example works fine too.

Compare CuPy and NumPy

import cupy as cp
import numpy as np
import time

Using NumPy

a_cpu = np.array([[4, 3, 2], [-2, 2, 3], [3, -5, 2]])
b_cpu = np.array([25, -10, -4])

ti = time.perf_counter()
x_cpu = np.linalg.solve(a_cpu, b_cpu)
tf = time.perf_counter()

print(f’NumPy result: \t{x_cpu}’)
print(f’NumPy time: \t{tf - ti:.4g}’)

Using CuPy

a_gpu = cp.array([[4, 3, 2], [-2, 2, 3], [3, -5, 2]])
b_gpu = cp.array([25, -10, -4])

ti = time.perf_counter()
x_gpu = cp.linalg.solve(a_gpu, b_gpu)
tf = time.perf_counter()

print(f’CuPy result: \t{x_gpu}’)
print(f’CuPy time: \t{tf - ti:.4g}’)

Hi Gavin,

So I believe that here CuPy is using cusolver as opposed to CUBLAS to
solve the system.

Would you be able to run the following snippet for me?

from ctypes import CDLL, POINTER, c_void_p

lib = CDLL('libcublas.so')
create = lib.cublasCreate_v2
create.argtypes = [POINTER(c_void_p)]

handle = c_void_p()
print(create(handle))

This should print 0.

Regards, Freddie.

Freddie, I ran your script and it prints 1.

In my bashrc file, I’m pointing to the CUDA environment as follows:

export PATH="$PATH:/usr/local/cuda-9.2/bin"
export LD_LIBRARY_PATH="/usr/local/cuda-9.2/lib64"

The libcublas.so is located in /usr/local/cuda-9.2/lib64/ as:

lrwxrwxrwx 1 root root 16 Sep 25 19:45 libcublas.so -> libcublas.so.9.2*
lrwxrwxrwx 1 root root 19 Sep 25 19:45 libcublas.so.9.2 -> libcublas.so.9.2.88*

Hi Gavin,

It appears as if either your CUDA install or environment is broken. It
could be the version of CUBLAS the loader is finding is incompatible
with your main CUDA version.

Regards, Freddie.

So I wiped the system I was working on and did a fresh install of CUDA. Everything seems to be working fine now.