Hyperbolic diffusion and kernel fusion

WillT · 30 November 2021 10:31

A paper of our recently got accepted to CPC on the topic of hyperbolic diffusion and the optimisation opportunities this presents us with. See here.

Hyperbolic diffusion is a method where you transform the second order diffusion terms into first order terms by adding additional equations. This idea was first proposed by Hiro Nishikawa, who if you haven’t come across before I really recommend you check out his website: http://www.cfdbooks.com/, as well as his papers.

A main benefit of hyperbolic diffusion is that the stability of the resulting system scales with 1/h rather than 1/h^2. This is of course very helpful for high Reynold’s number flows. The additional benefit from an FR point of view is that you no longer have to do all the steps for a diffusion system, as its now purely advective. The downside is that you end up with quite a few equations, we focused on ACM for which you get 13 equations in 3D (rather than 4).

Once you have a purely advective system, however, there are some optimisations you can make to PyFR. The largest one being that normally you calculate the flux gradient by doing the following steps

U \rightarrow F \rightarrow \nabla\cdot F

where the intermediate F is calculated, written to global memory, and then read back in for the final \nabla\cdot F part. For diffusion systems you have to do this but for advection this is wasted bandwidth as we don’t use F again. So what we did was fuse these kernels together, and given the matrix used for the divergence it made most sense to do this for tensor product elements only.

The majority of the hard work was done by adding functionality to GiMMiK. This included an interesting memory manager to automatically handle the use of shared memory, as well as allow for automatic use of the new shared async memcopy commands on NVIDIA hardware. As a side note, the framework of GiMMiK is a really powerful tool, you can do stuff like this kernel fusion, but also things like generating CUDA intermediate assembly, PTX, rather than CUDA C. See here for some features that might get mainlined one day.

The top line result was that, after a bit of work, optimised hyperbolic diffusion was 2.3\times-2.6\times faster for the TGV than regular ACM.

P.S. Keep you eyes out for a recent work comparing ACM, ACM-HD and the alternative EDAC method.

luli · 23 January 2024 02:24

Has this method been applied to PyFR? If so, from which version?

WillT · 23 January 2024 11:35

No this is not a mainline feature of pyfr. I implemented on my fork of PyFR, which can be found here: GitHub - WillTrojak/PyFR at feature/tensor

and would need a custom version of Gimmik: GitHub - WillTrojak/GiMMiK at sparse

However, this is now three years old, so it probably won’t work anymore, nor was it intended to be used by anyone but me. This is all a way of me saying, I can’t really support you if you want to use it. But you can have a look to see how things were implemented.

luli · 24 January 2024 06:33

Too bad, why not add it as a feature to PyFR? I think it would be a great job!

fdw · 24 January 2024 14:53

Maintaining features requires a significant amount of work on our part. Thus we do not normally add things unless there are very completing advantages.

Regards, Freddie.

WillT · 25 January 2024 09:21

Given the complexity of the implementation and operation of this particular feature, the win wasn’t compelling enough to mainline it. However, if that were to change, we would definitely consider it.

Topic		Replies	Views
Gradient fusion Development	5	33	27 May 2025
PyFR release: v1.15.0 News	5	341	10 October 2022
TGV Performance Numbers General	34	3161	5 July 2022
What would be a good explanation of PyFR having a good utilization of GPU acceleration? General	1	222	25 July 2019
Running the same case with both PyFR 1.14 and 1.15, the kernel is very different	3	46	14 September 2024

Hyperbolic diffusion and kernel fusion

Related topics