Another problem with cuda backend: cuda driver?

ENVIRONMENT

OS:

$ cat /etc/os-release
  NAME="Red Hat Enterprise Linux"
  VERSION="8.2 (Ootpa)"
  ID="rhel"
  ID_LIKE="fedora"
  VERSION_ID="8.2"
  PLATFORM_ID="platform:el8"
  PRETTY_NAME="Red Hat Enterprise Linux 8.2 (Ootpa)"
  ANSI_COLOR="0;31"
  CPE_NAME="cpe:/o:redhat:enterprise_linux:8.2:GA"
  HOME_URL="https://www.redhat.com/"
  BUG_REPORT_URL="https://bugzilla.redhat.com/"

  REDHAT_BUGZILLA_PRODUCT="Red Hat Enterprise Linux 8"
  REDHAT_BUGZILLA_PRODUCT_VERSION=8.2
  REDHAT_SUPPORT_PRODUCT="Red Hat Enterprise Linux"
  REDHAT_SUPPORT_PRODUCT_VERSION="8.2"

PyFR:
Version 1.12.0, installed in a python virtual environment, along with all the required python packages

OpenMPI:

$ mpiexec --version
  mpiexec (OpenRTE) 4.1.0

  Report bugs to http://www.open-mpi.org/community/help/

gcc:

$ gcc --version
  gcc (GCC) 10.2.0
  Copyright (C) 2020 Free Software Foundation, Inc.
  This is free software; see the source for copying conditions.  There is NO
  warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR       
  PURPOSE.

Python:

$ python --version
  Python 3.7.9

CUDA:

$ nvcc --version
  nvcc: NVIDIA (R) Cuda compiler driver
  Copyright (c) 2005-2020 NVIDIA Corporation
  Built on Tue_Sep_15_19:10:02_PDT_2020
  Cuda compilation tools, release 11.1, V11.1.74
  Build cuda_11.1.TC455_06.29069683_0

ucx:
version 1.8.1 (used for openmpi installation)

THE PROBLEM:
If I run the command reported in the PyFR documentation for the examples https://pyfr.readthedocs.io/en/latest/examples.html, i.e. running with the cuda backend, I get the following error messages (I here paste the error I am getting from the euler vortex example, ran with only 1 mpi task, in order to avoid a too long error message - in the case of running with n mpi tasks, the same error is repeated n times):

Traceback (most recent call last):
  File "/davinci-1/home/cipollettaf/pyfr/lib/python3.7/site-packages/pyfr-1.12.0-py3.7.egg/pyfr/util.py", line 33, in __call__
KeyError: (<function CUDAKernelProvider._build_kernel at 0x15554aa035f0>, b'\x80\x03X\t\x00\x00\x00gimmik_mmq\x00Xl\x0c\x00\x00\n__global__ void\ngimmik_mm(int n,\n         const double* __restrict__ b, int ldb,\n         double* __restrict__ c, int ldc)\n{\n    int i = blockDim.x*blockIdx.x + threadIdx.x;\n    double dotp;\n\n    if (i < n)\n    {\n        dotp = 1.5267881254572668*b[i + 0*ldb] + -0.8136324494869274*b[i + 4*ldb] + 0.40076152031165047*b[i + 8*ldb] + -0.11391719628199004*b[i + 12*ldb];\n        c[i + 0*ldc] = dotp;\n        dotp = 1.5267881254572668*b[i + 1*ldb] + -0.8136324494869274*b[i + 5*ldb] + 0.40076152031165047*b[i + 9*ldb] + -0.11391719628199004*b[i + 13*ldb];\n        c[i + 1*ldc] = dotp;\n        dotp = 1.5267881254572668*b[i + 2*ldb] + -0.8136324494869274*b[i + 6*ldb] + 0.40076152031165047*b[i + 10*ldb] + -0.11391719628199004*b[i + 14*ldb];\n        c[i + 2*ldc] = dotp;\n        dotp = 1.5267881254572668*b[i + 3*ldb] + -0.8136324494869274*b[i + 7*ldb] + 0.40076152031165047*b[i + 11*ldb] + -0.11391719628199004*b[i + 15*ldb];\n        c[i + 3*ldc] = dotp;\n        dotp = -0.11391719628199004*b[i + 0*ldb] + 0.40076152031165047*b[i + 1*ldb] + -0.8136324494869274*b[i + 2*ldb] + 1.5267881254572668*b[i + 3*ldb];\n        c[i + 4*ldc] = dotp;\n        dotp = -0.11391719628199004*b[i + 4*ldb] + 0.40076152031165047*b[i + 5*ldb] + -0.8136324494869274*b[i + 6*ldb] + 1.5267881254572668*b[i + 7*ldb];\n        c[i + 5*ldc] = dotp;\n        dotp = -0.11391719628199004*b[i + 8*ldb] + 0.40076152031165047*b[i + 9*ldb] + -0.8136324494869274*b[i + 10*ldb] + 1.5267881254572668*b[i + 11*ldb];\n        c[i + 6*ldc] = dotp;\n        dotp = -0.11391719628199004*b[i + 12*ldb] + 0.40076152031165047*b[i + 13*ldb] + -0.8136324494869274*b[i + 14*ldb] + 1.5267881254572668*b[i + 15*ldb];\n        c[i + 7*ldc] = dotp;\n        dotp = -0.11391719628199004*b[i + 0*ldb] + 0.40076152031165047*b[i + 4*ldb] + -0.8136324494869274*b[i + 8*ldb] + 1.5267881254572668*b[i + 12*ldb];\n        c[i + 8*ldc] = dotp;\n        dotp = -0.11391719628199004*b[i + 1*ldb] + 0.40076152031165047*b[i + 5*ldb] + -0.8136324494869274*b[i + 9*ldb] + 1.5267881254572668*b[i + 13*ldb];\n        c[i + 9*ldc] = dotp;\n        dotp = -0.11391719628199004*b[i + 2*ldb] + 0.40076152031165047*b[i + 6*ldb] + -0.8136324494869274*b[i + 10*ldb] + 1.5267881254572668*b[i + 14*ldb];\n        c[i + 10*ldc] = dotp;\n        dotp = -0.11391719628199004*b[i + 3*ldb] + 0.40076152031165047*b[i + 7*ldb] + -0.8136324494869274*b[i + 11*ldb] + 1.5267881254572668*b[i + 15*ldb];\n        c[i + 11*ldc] = dotp;\n        dotp = 1.5267881254572668*b[i + 0*ldb] + -0.8136324494869274*b[i + 1*ldb] + 0.40076152031165047*b[i + 2*ldb] + -0.11391719628199004*b[i + 3*ldb];\n        c[i + 12*ldc] = dotp;\n        dotp = 1.5267881254572668*b[i + 4*ldb] + -0.8136324494869274*b[i + 5*ldb] + 0.40076152031165047*b[i + 6*ldb] + -0.11391719628199004*b[i + 7*ldb];\n        c[i + 13*ldc] = dotp;\n        dotp = 1.5267881254572668*b[i + 8*ldb] + -0.8136324494869274*b[i + 9*ldb] + 0.40076152031165047*b[i + 10*ldb] + -0.11391719628199004*b[i + 11*ldb];\n        c[i + 14*ldc] = dotp;\n        dotp = 1.5267881254572668*b[i + 12*ldb] + -0.8136324494869274*b[i + 13*ldb] + 0.40076152031165047*b[i + 14*ldb] + -0.11391719628199004*b[i + 15*ldb];\n        c[i + 15*ldc] = dotp;\n    }\n}\nq\x01]q\x02(cnumpy\nint32\nq\x03cnumpy\nint64\nq\x04h\x03h\x04h\x03e\x87q\x05.', b'\x80\x03}q\x00.')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/davinci-1/home/cipollettaf/pyfr/lib/python3.7/site-packages/pyfr-1.12.0-py3.7.egg/pyfr/ctypesutil.py", line 33, in _errcheck
KeyError: 222

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/davinci-1/home/cipollettaf/pyfr/bin/pyfr", line 33, in <module>
    sys.exit(load_entry_point('pyfr==1.12.0', 'console_scripts', 'pyfr')())
  File "/davinci-1/home/cipollettaf/pyfr/lib/python3.7/site-packages/pyfr-1.12.0-py3.7.egg/pyfr/__main__.py", line 117, in main
  File "/davinci-1/home/cipollettaf/pyfr/lib/python3.7/site-packages/pyfr-1.12.0-py3.7.egg/pyfr/__main__.py", line 246, in process_run
  File "/davinci-1/home/cipollettaf/pyfr/lib/python3.7/site-packages/pyfr-1.12.0-py3.7.egg/pyfr/__main__.py", line 227, in _process_common
  File "/davinci-1/home/cipollettaf/pyfr/lib/python3.7/site-packages/pyfr-1.12.0-py3.7.egg/pyfr/solvers/__init__.py", line 16, in get_solver
  File "/davinci-1/home/cipollettaf/pyfr/lib/python3.7/site-packages/pyfr-1.12.0-py3.7.egg/pyfr/integrators/__init__.py", line 36, in get_integrator
  File "/davinci-1/home/cipollettaf/pyfr/lib/python3.7/site-packages/pyfr-1.12.0-py3.7.egg/pyfr/integrators/std/controllers.py", line 14, in __init__
  File "/davinci-1/home/cipollettaf/pyfr/lib/python3.7/site-packages/pyfr-1.12.0-py3.7.egg/pyfr/integrators/std/base.py", line 28, in __init__
  File "/davinci-1/home/cipollettaf/pyfr/lib/python3.7/site-packages/pyfr-1.12.0-py3.7.egg/pyfr/solvers/base/system.py", line 67, in __init__
  File "/davinci-1/home/cipollettaf/pyfr/lib/python3.7/site-packages/pyfr-1.12.0-py3.7.egg/pyfr/solvers/base/system.py", line 188, in _gen_kernels
  File "/davinci-1/home/cipollettaf/pyfr/lib/python3.7/site-packages/pyfr-1.12.0-py3.7.egg/pyfr/solvers/baseadvec/elements.py", line 53, in <lambda>
  File "/davinci-1/home/cipollettaf/pyfr/lib/python3.7/site-packages/pyfr-1.12.0-py3.7.egg/pyfr/backends/base/backend.py", line 163, in kernel
  File "/davinci-1/home/cipollettaf/pyfr/lib/python3.7/site-packages/pyfr-1.12.0-py3.7.egg/pyfr/backends/cuda/gimmik.py", line 38, in mul
  File "/davinci-1/home/cipollettaf/pyfr/lib/python3.7/site-packages/pyfr-1.12.0-py3.7.egg/pyfr/util.py", line 35, in __call__
  File "/davinci-1/home/cipollettaf/pyfr/lib/python3.7/site-packages/pyfr-1.12.0-py3.7.egg/pyfr/backends/cuda/provider.py", line 19, in _build_kernel
  File "/davinci-1/home/cipollettaf/pyfr/lib/python3.7/site-packages/pyfr-1.12.0-py3.7.egg/pyfr/backends/cuda/compiler.py", line 114, in __init__
  File "/davinci-1/home/cipollettaf/pyfr/lib/python3.7/site-packages/pyfr-1.12.0-py3.7.egg/pyfr/backends/cuda/driver.py", line 291, in load_module
  File "/davinci-1/home/cipollettaf/pyfr/lib/python3.7/site-packages/pyfr-1.12.0-py3.7.egg/pyfr/backends/cuda/driver.py", line 166, in __init__
  File "/davinci-1/home/cipollettaf/pyfr/lib/python3.7/site-packages/pyfr-1.12.0-py3.7.egg/pyfr/ctypesutil.py", line 35, in _errcheck
pyfr.backends.cuda.driver.CUDAError

Looking at the last line in the traceback, it seems that the issue is related to the cuda driver selection but I could not see any further details about that in the PyFR documentation. I wonder if that could be related to the CUDA toolkit version that I am trying to use (11.1). The reason why I am saying this is because my research group has already tried installing PyFR in a container and through several trials we found the following:

  1. the container with cudatoolkit version 11.1 or 11.2 gave the same error messages;
  2. the container with cudatoolkit 11.0 worked just fine.

I am not able to understand if the problem is realted to the cudatoolkit I am using or to the way in which the driver is initialized within the PyFR code.

Can someone comment about that or give some hints about possible solutions or workarounds?

FURTHER REFERENCES:
I noted that in the forum some problems with the cuda backend where already discussed and resolved at https://pyfr.discourse.group/t/problem-with-cuda-backend/306, but I am not able to view the answer given by Freddie (not sure of the reason).

LAST NOTE:
The OpenMP backend works just fine.

Regards,
Federico Cipolletta

The error number returned by CUDA is 222. Looking this up in the CUDA documentation we see this corresponds to CUDA_ERROR_UNSUPPORTED_PTX_VERSION. This is caused by your driver (which compiles the PTX code to assembly code) being too old to understand the PTX being generated by the CUDA toolkit you’re using.

The NVIDIA driver is a separate package to the CUDA toolkit. Further details can be found here:

https://docs.nvidia.com/deploy/cuda-compatibility/index.html

Observe that how CUDA 11.0 is happy with 450.36.06 but 11.1, 11.2, and 11.3 need 450.80.02 or later.

Regards, Freddie.

Hello Freddie,

Thank you very much for your kind answer. Indeed you are correct. However, I do have the correct driver on our machine:

$ nvidia-smi
Mon Jun 14 17:07:37 2021
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 450.80.02    Driver Version: 450.80.02    CUDA Version: 11.0     |
|-------------------------------+----------------------+----------------------+

I don’t really know what is going on in there…do you have any other guess?

Best,
Federico Cipolletta.

If you are running inside of a container see what version of libcuda.so is being provided in the container. (My understanding is that libcuda.so normally comes with the driver and not the toolkit. Hence, inside of a container versions can get out of sync unless care is taken.)

Regards, Freddie.

Thank you again for your kind answer. No, I am not running from a container because I would like to take advantage of this beautiful code to make some research (and I would like to have the possibility of modifying the source code).

However, I am running PyFR from a python virtual environment, where I use the versions of the softwares that I previously listed. I spent some time trying to install the python dependencies in the home/.local/ directory but at the end I decided to use the python virtual environment because I thought that it could be easier.

I wonder if the problem could be related to how the driver was compiled: I mean, from nvidia-smi it seems that the driver version is correct but is “looking” to CUDA 11.0, while nvcc --version returns 11.1. In other words, I wonder if the driver should be installed from scratch (although I think that it is quite unlikely and I could easily be wrong).

Regards,
Federico Cipolletta.

This will be your problem. If we’re picking up the run-time compiler (which is slightly different to nvcc) from a newer CUDA toolkit (say 11.1 or 11.2) then it may emit instructions with the driver/libcuda (which are at version 11.0) do not understand.

Given this I would try and understand where libcuda.so is coming from.

Regards, Freddie.

Hello Freddie. I finally managed to make PyFR running correctly with the cuda backend on our machine. The issue was indeed caused by the run-time compiler nvvm (we printed that adding a simple print statement of the variable ptx at line 165 of the file backends/cuda/driver.py): it was using the correct cudatoolkit (11.1) and driver (450.80.02) but was automatically calling the libcuda.so library from CUDA 11.0, (probably due to one error in the compilation of the driver).

Luckily, on our machine we also had the nvidia sdk with version 11.0, containing its relative driver, mpiexec and run-time compiler. Using these softwares with PyFR solved the problem.

The difficulty that I faced was caused by the fact that the environment that I described in my first message was already succesfully used by our group with a Fortran code and that was misleading to my eyes. Anyway, we understood that the reason of this environment working correctly in the case of Fortran was given by the fact that the libcudafor did not change from cudatoolkit 11.0 to 11.1, while the relative libcuda.so did change.

The insight that you gave in your last message was crucial to find an answer. Thank you very much for the help!

1 Like