Broadcasting terms to kernels

Hi,

I want to have a forcing term (line tripping, as equation 2.2 in the paper Turbulent boundary layers at moderate Reynolds numbers: inflow length and tripping effects | Journal of Fluid Mechanics | Cambridge Core) such as

sum( [sin(j * ploc + phs) for phs in Nphase])

where Nphase stores N random phase that should update every few hundred time steps. So I referred the turbulence plugin where random number generator and other functions are available. I tried to use _set_external function to pass random phase and update accordingly. However, Nphase is just a vector with a few hundred numbers. According to the post PyFR boundary conditions for turbomachinery - #16 by fdw discussed before, broadcast-col should only be used with matrix with shape (..., neles). For this situation, could you give some examples of broadcast. And what is the difference between broadcast, broadcast-col, broadcast-row,‘view’?

And also when I store a matrix in the backend intg.backend.matrix(shape, tags={'align'}), can I store a vector?

Thanks in advance.

Regards,
Zhenyang

In your instance you want a regular broadcast condition which will give each instance of the kernel the ability to access the entire array.

Regards, Freddie.

I tried with broadcast but it gave me an error:

raise ValueError('Broadcasts must have two dimensions')

from regenerator.py. And my ncdim = 1 which raised the error. And I wonder what is the dim should be in the broadcast expression:

eles._set_external('hphs', f'in broadcast fpdtype_t[{dim}]', value=hphs)

Regards,
Zhenyang

It should be two dimensional. This makes sense as matrix objects are always two dimensional. You can simply make the first dimension 1. The dim should be the number of columns in the matrix.

Regards, Freddie.

Thanks, I think now it is working as expected. Another thing is that in the code I am restricting the tripping line inside a pipe at wall. So essentially I need to calculate a spatial Gaussian function to limit it to the wall regime which will require radius r and angle theta. For now I calculate it inside mako kernel which I felt wasting computational power since it is calculated every time step. Ideally we should only calculate it once and offload to GPU. Do you have better idea to do this without passing a new variable to the kernel but as an external variable? Is this still possible?

Regards,
Zhenyang

Calculating each time is probably the most efficient approach on a GPU. The benefits of pre-calculating are likely marginal (and would require an extra external variable).

Regards, Freddie.

But variables radius and angle are coordinate related so they won’t change with time integration. If the case is big with a lot of grid points, this calculation could be significant as I expect. As I understand, calling an external variable without updating won’t cost anything, no?

Also, to help me to understand better about broadcasting, in this case, assuming my external variable has a shape of (nvar, npts, neles) (similar to plocs), which broadcasting method should be used, broadcast-col?

Regards,
Zhenyang

The cost of reading in the physical locations (three values per point) is likely a rounding error in the grand scheme of things.

The main cost will likely be in terms of worse scaling. If all the elements with the additional term end up on the same rank then that rank will slow down (as those points have more work to do) and this, in turn, will slow down every rank in the simulation. This is where the real waste is: ranks which do not have any points sitting around doing nothing while they wait for the ranks which do have points to finish.

For broadcasting, broadcast-col is when a value is broadcast down a column so all npts see the same value(s). Similarly, broadcast-row is when a value is broadcast along a row so all neles see the same value(s). If you want everyone to see the exact same value(s) then you use broadcast on its own without a suffix.

Regards, Freddie.