Hi,
I want to understand how the gradient fusion work in PyFR. As my understanding, if gradient fusion is enabled, transformed corrected gradients are stored in the self._grad_upts, transform is done by the kernel ‘eles/tdisf_fused’. If the gradient fusion disabled, transformed corrected gradients are stored in the self._vect_upts (and also self._grad_upts since they are sharing the same pointer and they will be rewritten by other kernels later), and transform is done with kernel ‘gradcoru_upts’ (the same as ‘gradcoru_u’). And boundary flux always directly access the transformed corrected gradients. Am I right with the data flow?
I am testing the linear stability solver. I have assigned zeros initial condition for the fluctuations and I would expect zero everywhere for any t > 0, which is the case when I tried with GPU (with gradient fusion), but openmp backend (no fusion) give me non-zero values. I wonder if this is related to the kernel fusion I messed up.
Regards,
Zhenyang
Without gradient fusion the kernel gradcoru_upts
is responsible for un-transforming the gradients (in-place) and tdisf
is responsible for computing the flux (overwriting the gradients in the process).
With gradient fusion these two kernels are merged with the un-transformed gradients now being written out to an extra array. This kernel is tdisf_fused
. The arrays are set up so _grad_upts
always points to the right thing after either gradcoru_upts
or tdisf_fused
has been called.
Boundary kernels and interface kernels always used un-transformed (physical) gradients.
Note that this is not to be confused with another optimisation PyFR performs if the flux points are a subset of the solution points. Here, we avoid the need for an explicit interpolation kernel gradcoru_fpts
.
Regards, Freddie.
Thanks. I was checking the gradients in the interface kernels and it seems they are accessing the wrong data. In my test case, they should be zero everywhere but they are accessing the baseflow gradient I calculated. And this situation only happens on the CPU backend (without gradient fusion). I suspect this is related to my baseflow gradient calculation. Here is the graph I used for baseflow gradients. It is basically the same as the _compute_grads_graph but I added some packing to isolate the baseflow gradient view matrices. Can you spot anything wrong that could related to the gradient fusion?
def _compute_basegrads_graph(self, uinbank):
m = self._mpireqs
k, _ = self._get_kernels(uinbank, None)
def deps(dk, *names): return self._kdeps(k, dk, *names)
g1 = self.backend.graph()
g1.add_mpi_reqs(m['sbase_fpts_recv'])
# Interpolate the solution to the flux points
g1.add_all(k['eles/disu'])
# !!!!!!
# Pack iint and bcint soluitons
g1.add_all(k['iint/lhs_fpts_pack'], deps=k['eles/disu'])
g1.add_all(k['iint/rhs_fpts_pack'], deps=k['eles/disu'])
g1.add_all(k['bcint/lhs_fpts_pack'], deps=k['eles/disu'])
# Pack and send these interpolated solutions to our neighbours
g1.add_all(k['mpiint/sbase_fpts_pack'], deps=k['eles/disu'])
for send, pack in zip(m['sbase_fpts_send'], k['mpiint/sbase_fpts_pack']):
g1.add_mpi_req(send, deps=[pack])
# Compute the common solution at our internal/boundary interfaces
for l in k['eles/copy_fpts']:
g1.add(l, deps=deps(l, 'eles/disu'))
kdeps = k['eles/copy_fpts'] or k['eles/disu']
g1.add_all(k['iint/con_ub'], deps=kdeps)
g1.add_all(k['bcint/con_ub'], deps=kdeps)
# Compute the transformed gradient of the partially corrected solution
g1.add_all(k['eles/tgradpcoru_upts'])
g1.commit()
g2 = self.backend.graph()
g2.add_mpi_reqs(m['vbase_fpts_recv'])
# Compute the common solution at our MPI interfaces
g2.add_all(k['mpiint/sbase_fpts_unpack'])
for l in k['mpiint/con_ub']:
g2.add(l, deps=deps(l, 'mpiint/sbase_fpts_unpack'))
# Compute the transformed gradient of the corrected solution
g2.add_all(k['eles/tgradcoru_upts'], deps=k['mpiint/con_ub'])
# Obtain the physical gradients at the solution points
for l in k['eles/gradcoru_u']:
g2.add(l, deps=deps(l, 'eles/tgradcoru_upts'))
# Interpolate baseflow gradients to the flux points
for l in k['eles/gradcoru_fpts']:
g2.add(l, deps=deps(l, 'eles/gradcoru_u'))
# Set dependencies for interface flux interpolation
ideps = k['eles/gradcoru_fpts'] or k['eles/gradcoru_upts']
# !!!!!!!!!!!!!!!!!!
# Pack these interpolated baseflow gradients
g2.add_all(k['iint/vlhs_fpts_pack'], deps=ideps)
g2.add_all(k['iint/vrhs_fpts_pack'], deps=ideps)
g2.add_all(k['bcint/vlhs_fpts_pack'], deps=ideps)
# Pack and send these interpolated gradients to our neighbours
g2.add_all(k['mpiint/vbase_fpts_pack'], deps=ideps)
for send, pack in zip(m['vbase_fpts_send'], k['mpiint/vbase_fpts_pack']):
g2.add_mpi_req(send, deps=[pack])
g2.add_all(k['mpiint/vbase_fpts_unpack'])
g2.commit()
return g1, g2
Regards,
Zhenyang
Can you confirm what solution and flux point combinations you are testing with?
Also, calling packing kernels on boundary or interface kernels seems intrinsically wrong. You should only pack things which need to be sent to neighbouring ranks.
Regards, Freddie.
I am using Gauss-legendre with p=5 and quad. Anti-aliasing is disabled. I am doing packing since I want to use baseflow gradients in the future time steps and this baseflow gradient graph is only run once when starting. This is following our previous discussion in Save view matrices - #9 by Zhenyang
Regards,
Zhenyang
If you just need to capture the baseflow gradients for later a straight copy kernel is far less work and much more efficient. The base flows can then be treated like any other piece of constant data (like the normal vectors at interfaces or the physical locations of points, neither of which require views).
In terms of your problem I’d look at the un-transform kernel and see where it is outputting its data. This kernel is a bit special in the gradient fusion case as we still need it for the RHS gradient method.
Regards, Freddie.