I ran a simple program euler_vortex_2D using pyfr from the official pyfr1.14.0 repository and added the following fields to the config file to output residuals:
nsteps = 1
file = residual.csv
header = true
I ran the application on both cuda and hip platforms and got different residuals. Since the first step, the last number of their result is different. Why?
Floating point math is not associative.
@fdw may have some additional views on this, but there are a huge number of reasons for a difference between GPUs. Generally it is known that reproducability on GPUs is an issue, most of the time that is due to the use of atomics.
However in this cause it could be due to different roundings used in floating point operations. It could be becuase gimmik has several kernel options that will lead to different blockings and so the accumulations happen in different orders.
But also as freddie says, floating point maths isn’t associative and so a different compiler will move things around differently which will effect the errors.
Does pyfr use atomics ,where?
In a few places: atomics in pyfr
Note that this specific use does not have any impact on reproducibility as it is a minimum operation as opposed to a floating point accumulation.
I found that the “fmad” instruction has different effects on the results on different platforms. If I use “fmad=false” , the results are the same in cuda and hip. If I use “fmad=true”, the results are different.