Gradient fusion

Hi,

I want to understand how the gradient fusion work in PyFR. As my understanding, if gradient fusion is enabled, transformed corrected gradients are stored in the self._grad_upts, transform is done by the kernel ‘eles/tdisf_fused’. If the gradient fusion disabled, transformed corrected gradients are stored in the self._vect_upts (and also self._grad_upts since they are sharing the same pointer and they will be rewritten by other kernels later), and transform is done with kernel ‘gradcoru_upts’ (the same as ‘gradcoru_u’). And boundary flux always directly access the transformed corrected gradients. Am I right with the data flow?

I am testing the linear stability solver. I have assigned zeros initial condition for the fluctuations and I would expect zero everywhere for any t > 0, which is the case when I tried with GPU (with gradient fusion), but openmp backend (no fusion) give me non-zero values. I wonder if this is related to the kernel fusion I messed up.

Regards,
Zhenyang

Without gradient fusion the kernel gradcoru_upts is responsible for un-transforming the gradients (in-place) and tdisf is responsible for computing the flux (overwriting the gradients in the process).

With gradient fusion these two kernels are merged with the un-transformed gradients now being written out to an extra array. This kernel is tdisf_fused. The arrays are set up so _grad_upts always points to the right thing after either gradcoru_upts or tdisf_fused has been called.

Boundary kernels and interface kernels always used un-transformed (physical) gradients.

Note that this is not to be confused with another optimisation PyFR performs if the flux points are a subset of the solution points. Here, we avoid the need for an explicit interpolation kernel gradcoru_fpts.

Regards, Freddie.

Thanks. I was checking the gradients in the interface kernels and it seems they are accessing the wrong data. In my test case, they should be zero everywhere but they are accessing the baseflow gradient I calculated. And this situation only happens on the CPU backend (without gradient fusion). I suspect this is related to my baseflow gradient calculation. Here is the graph I used for baseflow gradients. It is basically the same as the _compute_grads_graph but I added some packing to isolate the baseflow gradient view matrices. Can you spot anything wrong that could related to the gradient fusion?

def _compute_basegrads_graph(self, uinbank):
        m = self._mpireqs
        k, _ = self._get_kernels(uinbank, None)

        def deps(dk, *names): return self._kdeps(k, dk, *names)

        g1 = self.backend.graph()
        g1.add_mpi_reqs(m['sbase_fpts_recv'])

        # Interpolate the solution to the flux points
        g1.add_all(k['eles/disu'])

        # !!!!!!
        # Pack iint and bcint soluitons 
        g1.add_all(k['iint/lhs_fpts_pack'], deps=k['eles/disu'])
        g1.add_all(k['iint/rhs_fpts_pack'], deps=k['eles/disu'])
        g1.add_all(k['bcint/lhs_fpts_pack'], deps=k['eles/disu'])

        # Pack and send these interpolated solutions to our neighbours
        g1.add_all(k['mpiint/sbase_fpts_pack'], deps=k['eles/disu'])
        for send, pack in zip(m['sbase_fpts_send'], k['mpiint/sbase_fpts_pack']):
            g1.add_mpi_req(send, deps=[pack])

        # Compute the common solution at our internal/boundary interfaces
        for l in k['eles/copy_fpts']:
            g1.add(l, deps=deps(l, 'eles/disu'))
        kdeps = k['eles/copy_fpts'] or k['eles/disu']
        g1.add_all(k['iint/con_ub'], deps=kdeps)
        g1.add_all(k['bcint/con_ub'], deps=kdeps)

        # Compute the transformed gradient of the partially corrected solution
        g1.add_all(k['eles/tgradpcoru_upts'])
        g1.commit()

        g2 = self.backend.graph()
        g2.add_mpi_reqs(m['vbase_fpts_recv'])

        # Compute the common solution at our MPI interfaces
        g2.add_all(k['mpiint/sbase_fpts_unpack'])
        for l in k['mpiint/con_ub']:
            g2.add(l, deps=deps(l, 'mpiint/sbase_fpts_unpack'))

        # Compute the transformed gradient of the corrected solution
        g2.add_all(k['eles/tgradcoru_upts'], deps=k['mpiint/con_ub'])

        # Obtain the physical gradients at the solution points
        for l in k['eles/gradcoru_u']:
            g2.add(l, deps=deps(l, 'eles/tgradcoru_upts'))

        # Interpolate baseflow gradients to the flux points
        for l in k['eles/gradcoru_fpts']:
            g2.add(l, deps=deps(l, 'eles/gradcoru_u'))

        # Set dependencies for interface flux interpolation
        ideps = k['eles/gradcoru_fpts'] or k['eles/gradcoru_upts']

        # !!!!!!!!!!!!!!!!!!
        # Pack these interpolated baseflow gradients
        g2.add_all(k['iint/vlhs_fpts_pack'], deps=ideps)
        g2.add_all(k['iint/vrhs_fpts_pack'], deps=ideps)
        g2.add_all(k['bcint/vlhs_fpts_pack'], deps=ideps)

        # Pack and send these interpolated gradients to our neighbours
        g2.add_all(k['mpiint/vbase_fpts_pack'], deps=ideps)
        for send, pack in zip(m['vbase_fpts_send'], k['mpiint/vbase_fpts_pack']):
            g2.add_mpi_req(send, deps=[pack])

        g2.add_all(k['mpiint/vbase_fpts_unpack'])
        g2.commit()

        return g1, g2

Regards,
Zhenyang

Can you confirm what solution and flux point combinations you are testing with?

Also, calling packing kernels on boundary or interface kernels seems intrinsically wrong. You should only pack things which need to be sent to neighbouring ranks.

Regards, Freddie.

I am using Gauss-legendre with p=5 and quad. Anti-aliasing is disabled. I am doing packing since I want to use baseflow gradients in the future time steps and this baseflow gradient graph is only run once when starting. This is following our previous discussion in Save view matrices - #9 by Zhenyang

Regards,
Zhenyang

If you just need to capture the baseflow gradients for later a straight copy kernel is far less work and much more efficient. The base flows can then be treated like any other piece of constant data (like the normal vectors at interfaces or the physical locations of points, neither of which require views).

In terms of your problem I’d look at the un-transform kernel and see where it is outputting its data. This kernel is a bit special in the gradient fusion case as we still need it for the RHS gradient method.

Regards, Freddie.