CUDA run issues - NVRTC run error with conda

Hello,

Apologies for the issues, I now have PyFR compiled but encountering problems during run with the CUDA backend. The log is attached below, but it appears there is some undefined symbol with the libnvrtc.so driver. Looking for some diagnostic guidance and if this might be related to incorrectly compiled CUDA version?

Thanks again!

nvcc version:

(pyfr1.15-venv) [roya3@gra-login3 2d-inc-cylinder]$ nvcc --version
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2023 NVIDIA Corporation
Built on Tue_Aug_15_22:02:13_PDT_2023
Cuda compilation tools, release 12.2, V12.2.140
Build cuda_12.2.r12.2/compiler.33191640_0

Error ouput along with nvidia-smi:

Wed Nov  8 01:44:48 2023       
+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.105.17   Driver Version: 525.105.17   CUDA Version: 12.2     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  Tesla P100-PCIE...  On   | 00000000:83:00.0 Off |                    0 |
| N/A   38C    P0    25W / 250W |      0MiB / 12288MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
                                                                               
+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|  No running processes found                                                 |
+-----------------------------------------------------------------------------+
--------------------------------------------------------------------------
WARNING: There is at least non-excluded one OpenFabrics device found,
but there are no active ports detected (or Open MPI was unable to use
them).  This is most certainly not what you wanted.  Check your
cables, subnet manager configuration, etc.  The openib BTL will be
ignored for this job.

  Local host: gra951
--------------------------------------------------------------------------
Traceback (most recent call last):
  File "/home/roya3/.local/bin/pyfr", line 8, in <module>
    sys.exit(main())
  File "/home/roya3/.local/lib/python3.10/site-packages/pyfr/__main__.py", line 118, in main
    args.process(args)
  File "/home/roya3/.local/lib/python3.10/site-packages/pyfr/__main__.py", line 251, in process_run
    _process_common(
  File "/home/roya3/.local/lib/python3.10/site-packages/pyfr/__main__.py", line 230, in _process_common
    backend = get_backend(args.backend, cfg)
  File "/home/roya3/.local/lib/python3.10/site-packages/pyfr/backends/__init__.py", line 12, in get_backend
    return subclass_where(BaseBackend, name=name.lower())(cfg)
  File "/home/roya3/.local/lib/python3.10/site-packages/pyfr/backends/cuda/base.py", line 21, in __init__
    self.nvrtc = NVRTC()
  File "/home/roya3/.local/lib/python3.10/site-packages/pyfr/backends/cuda/compiler.py", line 54, in __init__
    self.lib = NVRTCWrappers()
  File "/home/roya3/.local/lib/python3.10/site-packages/pyfr/ctypesutil.py", line 18, in __init__
    fn = getattr(lib, fname)
  File "/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/python/3.10.2/lib/python3.10/ctypes/__init__.py", line 387, in __getattr__
    func = self.__getitem__(name)
  File "/cvmfs/soft.computecanada.ca/easybuild/software/2020/avx2/Core/python/3.10.2/lib/python3.10/ctypes/__init__.py", line 392, in __getitem__
    func = self._FuncPtr((name_or_ordinal, self))
AttributeError: /cvmfs/soft.computecanada.ca/easybuild/software/2020/Core/cudacore/11.0.2/lib64/libnvrtc.so: undefined symbol: nvrtcGetCUBINSize. Did you mean: 'nvrtcGetPTXSize'?

It looks like Conda is trying to use Cuda 11.0.2. You should try updating the path/dependencies used by Conda.

You’re absolutely right, looks like the system was pulling in correct PATH variables for the /bin locations but incorrect LD_LIBRARY_PATH variables from previous module load for the /lib locations.

Works well now after resetting the env variables, thank you!