Smats for linearised elements

For navstokes with backend::cuda, a question about smats for kernels {gradcoru, gradcorulin} {tflux, tfluxlin}.

For the curved kernels {gradcoru, tflux}, precalculated constant smats data gets passed into the kernels at runtime.

But for the linear kernels {gradcorulin, tfluxlin}, it looks like the constant geometric data (verts, upts) is passed into the kernels, and smats gets recalculated each time the kernels are called.

Is there a reason for not precalculating smats for the linear/linearised elements, or am I missing something?

The idea with the linear element kernels is that, if an element is linear, the you don’t need to read in all the smats terms. Instead you can save bandwidth by only reading in the vertices and then reconstructing them. Given that we are bandwidth bound, this leads to a win.

The win will depend on the element type as, for example, a hex has more corner points that a tet.

1 Like

Thanks @WillT. This came up when I was looking at the kernels for quads. Since the end result of the runtime recalculation of the smats is just 5 floats (i.e. smats[2][2], djac), seems like in this case it would use less bandwidth to pass 5 pre-calculated floats rather than all the point and vertex data?

I’ll take a closer look and try to weigh both options.

Consider a p = 4 linear quad. We have 25 solution points, so the smats requires loading 25*4 = 100 items. Now, instead consider loading four vertices (8 items) plus 25 solution points (50 items) and using them to compute the smats. Here we only need to load a total of 58 items per element. However, as every quad has the same set of solution points, this cost is amortised out, and so for all intents and purposes we are down to 8 items per element rather than 100.

Regards, Freddie.

1 Like

Thanks @fdw, I failed to notice all the solution points (upts) for an element could be sitting in L1. Very neat!