Problem with GiMMiK kernel

mbareford · 18 May 2021 15:50

Hello,

I have a particular test case (involving a triangular aerofoil) that runs successfully with PyFR 1.10 but fails with PyFR 1.11/1.12. The problem occurs during an attempt to compile a gimmik_mm kernel.

CUDA_ERROR_INVALID_PTX = 218
This indicates that a PTX JIT compilation failed.

The simulation is running on two NVIDIA GPUs (Tesla V100-SXM2-16GB Volta) using OpenMPI 4.1.0.
The CUDA version is 10.2 (or 10.2.89 to be precise).

Do I need to have some specific nvidia cuda software in place before I can run the latest versions of PyFR?

Thanks in advance,
Michael

WillT · 18 May 2021 16:23

I was thinking about your issue, I have used pyfr 1.11 on Cuda 10.2 before I think, so I’m not sure what your issue might be.

Any chance you could send me the code produced by gimmik just before it fails? My email is xxx (see DM). You should be able to get this with a print just here

fdw · 18 May 2021 17:59

This is most likely driver related. To this end I suggest upgrading both CUDA (to version 11) and the driver version to the latest one available on NVIDIA’s website.

Regards, Freddie.

WillT · 18 May 2021 19:56

The code that is being produced looks fine, it was a bit of a long shot that there would be something wrong with the source produced by GiMMiK.

I agree with Freddie, try updating to the latest Cuda version. Something worth noting is that with PyFR version 1.11 we removed the dependency on PyCuda and instead calling the runtime compiler from the cuda library. So depending on how you have things set up this may be causing some issues.

mbareford · 24 May 2021 08:04

Thank you Will and Freddie for your advice.

I currently looking into getting a suitable version of the GPU kernel driver (>= 450.80.02) installed on the Cirrus Tier-2 machine, i.e., one that is compatible with CUDA 11.0.

I’ll retest PyFR and let you know the result once the driver has been updated.

mbareford · 22 July 2021 14:18

Just to confirm PyFR 1.12 is now working on Cirrus following a recent GPU kernel driver upgrade.
The driver for the NVIDIA Tesla V100-SXM2-16GB (Volta) GPU was upgraded from 440.64 to 460.73.01.
The new driver is compatible with CUDA 11.2.

Thanks again!

Topic		Replies	Views
Error when running partitioned mesh: Kernel grouping violates dependencies General	4	23	30 July 2024
What's the relationship between Gimmik and cublas? General	10	550	9 May 2022
Nvidia device not detected General	2	192	3 August 2014
Nvidia GPU Segmentation fault with OpenCL Errors	3	265	27 December 2022
"3D Triangular Aerofoil" example case Cases	1	150	20 March 2024

Problem with GiMMiK kernel

Related topics