I would like to know if the reduction kernel could be adapted to work on arrays whose ioshape is not three-dimensional (e.g. nupts, nvars, neles) but rather bidimensional (e.g. nvars, neles) and still perform a reduction over neles to get a nvars shape as a resulting array.
If so could you point out to the major changes that should be set for indexing the matrix appropriately?
The overall logic should be simple to adapt (basically just removing the inner loop which does the accumulation over the first dimension). I would suggest starting with the OpenMP backend as its reduction code is the simplest.