@fdw may have some additional views on this, but there are a huge number of reasons for a difference between GPUs. Generally it is known that reproducability on GPUs is an issue, most of the time that is due to the use of atomics.
However in this cause it could be due to different roundings used in floating point operations. It could be becuase gimmik has several kernel options that will lead to different blockings and so the accumulations happen in different orders.
But also as freddie says, floating point maths isn’t associative and so a different compiler will move things around differently which will effect the errors.