Extracting element index in reduction kernel

Frankx9 · 21 August 2025 20:12

Hi,

a clarification regarding the element index in the reduction kernel.

Specifically for OpenMP backend I can access the flattened index associated to a variable at a given quadrature point in a given element as:
idx = _y*BLK_SZ*${ncola} + ib*BLK_SZ*${ncola}*nrow + X_IDX_AOSOA(${i}, ${ncola});but I have the need of accessing a value of an array that has just nrow = 1 and ncola = 1 (1 variable).

Is therefore correct to extract the associated element index as:
idx = ib*BLK_SZ*${ncola}*nrow + X_IDX_AOSOA(0, ${ncola});

and for CUDA backend I have a flattened index as:
ixdtype_t idx = j*ldim + SOA_IX(i, blockIdx.y, gridDim.y);
hence the element index should be

ixdtype_t i = ixdtype_t(blockIdx.x)*blockDim.x + tid;

Is this correct in both cases?
Best

fdw · 21 August 2025 23:26

In the OpenMP case I believe you can further simplify the X_IDX_AOSOA(…) macro to _xi + _xj.

Regards, Freddie.

Frankx9 · 22 August 2025 05:40

I think something is still off on OpenMP

The initial shape on host of my array is (1, neles=52547) then when blocked on OpenMP becomes (3285, 1, neles=52547, 16).

I try to access the value (unique) at a given element with the flattened index

idx = ib\*BLK_SZ\*${ncola}\*nrow + X_IDX_AOSOA(0, ${ncola});

or

idx = ib\*BLK_SZ\*${ncola}\*nrow + _xi + _xj;

but in both cases it is wrong

Any idea?

Best

fdw · 22 August 2025 10:11

The blocked OpenMP shape you’re using seems wrong. Can you confirm the ioshape and datashape’s of your array on the backend?

To understand what is going on you may want to fill the matrix with integers counting up from 0.

Regards, Freddie.

Frankx9 · 22 August 2025 11:29

Hi,

the ioshape of the matrix is

(1, 52547)

and the datashape is

[3285, 1, 16]

Does it help?

fdw · 22 August 2025 12:27

I do not believe the reduction code has ever been tested on a matrix with a 2D ioshape. It is possible that some indices (sizes) are being picked up incorrectly. Does anything change if you go for an ioshape of (1, 1, 52547)?

Regards, Freddie.

Frankx9 · 22 August 2025 12:29

Wouldn’t this break the other kernels actually or not ensuring compatibility with the GPUs backend?

fdw · 22 August 2025 12:43

The reduction code is reimplemented by each backend on its own. Thus, the implementations are all slightly different and may behave differently with non-standard (i.e., 2D) matrices. If you use matrices with a 3D ioshape (which is the common case) everything should work.

Regards, Freddie.

Topic		Replies	Views
Reduction on two dimensional arrays Development	4	32	6 March 2025
Matrix dimensions and cublas kernels Development	2	245	13 December 2022
Uncoalescing data format has a bad impact on the GPU kernel performance General	7	414	8 November 2021
Questions about openmp backend performance General	3	210	29 June 2015
KeyError on 3D case running with WSL Ubuntu openMP Errors windows , openmp	28	1437	19 September 2021

Extracting element index in reduction kernel

Related topics