Different values on cuda vs. hip backend

luli · 6 February 2023 10:26

I ran a simple program euler_vortex_2D using pyfr from the official pyfr1.14.0 repository and added the following fields to the config file to output residuals:

[soln-plugin-residual]
nsteps = 1
file = residual.csv
header = true

I ran the application on both cuda and hip platforms and got different residuals. Since the first step, the last number of their result is different. Why?

fdw · 6 February 2023 12:46

Floating point math is not associative.

Regards, Freddie.

WillT · 6 February 2023 12:46

@fdw may have some additional views on this, but there are a huge number of reasons for a difference between GPUs. Generally it is known that reproducability on GPUs is an issue, most of the time that is due to the use of atomics.

However in this cause it could be due to different roundings used in floating point operations. It could be becuase gimmik has several kernel options that will lead to different blockings and so the accumulations happen in different orders.

But also as freddie says, floating point maths isn’t associative and so a different compiler will move things around differently which will effect the errors.

luli · 6 June 2023 03:48

Does pyfr use atomics ,where?

WillT · 6 June 2023 07:47

In a few places: atomics in pyfr

fdw · 6 June 2023 12:37

Note that this specific use does not have any impact on reproducibility as it is a minimum operation as opposed to a floating point accumulation.

Regards, Freddie.

luli · 8 June 2023 09:22

I found that the “fmad” instruction has different effects on the results on different platforms. If I use “fmad=false” , the results are the same in cuda and hip. If I use “fmad=true”, the results are different.

Topic		Replies	Views
Running the same case with both PyFR 1.14 and 1.15, the kernel is very different	3	39	14 September 2024
PyFR 2.0.2 Update question about graph feature Development	1	29	7 August 2024
PyFR release: v1.11.0 News	0	275	4 February 2021
How many GPU cores am I using? General	3	214	25 June 2015
Running on a single cpu core? Just Starting	3	410	29 July 2016

Different values on cuda vs. hip backend

Related topics