Reduction on two dimensional arrays

Frankx9 · 4 March 2025 18:17

Hi,

I would like to know if the reduction kernel could be adapted to work on arrays whose ioshape is not three-dimensional (e.g. nupts, nvars, neles) but rather bidimensional (e.g. nvars, neles) and still perform a reduction over neles to get a nvars shape as a resulting array.
If so could you point out to the major changes that should be set for indexing the matrix appropriately?

github.com/PyFR/PyFR

pyfr/backends/cuda/blasext.py

11785be69


      
          def reduction(self, *rs, method, norm, dt_mat=None):
              if any(r.traits != rs[0].traits for r in rs[1:]):
                  raise ValueError('Incompatible matrix types')
          
              cuda = self.backend.cuda
              ixdtype = self.backend.ixdtype
              nrow, ncol, ldim, fpdtype = rs[0].traits[1:]
              ncola, ncolb = rs[0].ioshape[1:]

Best

fdw · 4 March 2025 22:12

The overall logic should be simple to adapt (basically just removing the inner loop which does the accumulation over the first dimension). I would suggest starting with the OpenMP backend as its reduction code is the simplest.

Regards, Freddie.

Frankx9 · 5 March 2025 10:23

Hi @fdw,

thanks for the hint, I changed the inner loop as you suggested to:

if (i < ncolb)
{
    ixdtype_t idx = SOA_IX(i, blockIdx.y, gridDim.y);
    r = data[idx];
    acc += r;
}

but when I test it with a sum of all elements I get this as results

    test = np.ones((5, 49055)).astype(np.float32)
    test= self._be.matrix(test, initval= test, tags={'align'})
    red = self._be.kernel('sum', test)
    self._be.run_kernels([red], wait=True)

with a result which is

[49054. 49054. 49054. 49054. 49055.]

Is there anything else that should be changed?

Best

fdw · 5 March 2025 12:49

As per my suggestion, start with the OpenMP backend first for testing.

Regards, Freddie.

Frankx9 · 6 March 2025 14:58

(basically just removing the inner loop which does the accumulation over the first dimension)

In the openMP kernel is it

                #pragma omp simd
                for (ixdtype_t _xj = 0; _xj < SOA_SZ; _xj++)
                {
                 ...
                }

?

Best

Topic		Replies	Views
Matrix dimensions and cublas kernels Development	2	234	13 December 2022
PyFR for two dimensional problems General	2	311	22 September 2021
PyFR 1.12.1 Producing NaNs with 2D Incompressible Cylinder Example Cases opencl	6	418	4 July 2021
Element-wise operations Development	2	275	29 August 2022
Uncoalescing data format has a bad impact on the GPU kernel performance General	7	394	8 November 2021

Reduction on two dimensional arrays

Related topics