I was trying to schedule a graph to compute the corrected gradients at some boundary interfaces. But when trying use the graphs below, the values I get are not exactly correct. Is there any dependency I am missing?
g1 = self.backend.graph()
g1.add_mpi_reqs(m['scal_fpts_recv'])
# Interpolate the solution to the flux points
g1.add_all(k['eles/disu'])
# Pack and send these interpolated solutions to our neighbours
g1.add_all(k['mpiint/scal_fpts_pack'], deps=k['eles/disu'])
for send, pack in zip(m['scal_fpts_send'], k['mpiint/scal_fpts_pack']):
g1.add_mpi_req(send, deps=[pack])
# Compute the common solution at our internal/boundary interfaces
for l in k['eles/copy_fpts']:
g1.add(l, deps=deps(l, 'eles/disu'))
kdeps = k['eles/copy_fpts'] or k['eles/disu']
g1.add_all(k['iint/con_u'], deps=kdeps)
g1.add_all(k['bcint/con_u'], deps=kdeps)
# Compute the transformed gradient of the partially corrected solution
g1.add_all(k['eles/tgradpcoru_upts'])
g2 = self.backend.graph()
# Compute the common solution at our MPI interfaces
g2.add_all(k['mpiint/scal_fpts_unpack'])
for l in k['mpiint/con_u']:
g2.add(l, deps=deps(l, 'mpiint/scal_fpts_unpack'))
# Compute the transformed gradient of the corrected solution
g2.add_all(k['eles/tgradcoru_upts'], deps=k['mpiint/con_u'])
# Obtain the physical gradients at the solution points
for l in k['eles/gradcoru_upts']:
g2.add(l, deps=deps(l, 'eles/tgradcoru_upts'))
# Interpolate these gradients to the flux points
for l in k['eles/gradcoru_fpts']:
ldeps = deps(l, 'eles/gradcoru_upts')
g2.add(l, deps=ldeps)
Can you confirm that all of the kernels are indeed present? What I mean by this is that k['something'] actually contains a list of kernels (rather than being empty). You will want to confirm this for every kernel you add to each graph.
The kernels that are not there are ( I am running with 1 rank )
Key: eles/disu
Exists: True
Not empty: [<pyfr.backends.cuda.gimmik.CUDAGiMMiKKernels.mul.<locals>.MulKernel object at 0x7f0704d92f90>]
Key: mpiint/scal_fpts_pack
Exists: True
Not empty: []
Key: eles/copy_fpts
Exists: True
Not empty: [<pyfr.backends.cuda.blasext.CUDABlasExtKernels.copy.<locals>.CopyKernel object at 0x7f0704d91ca0>]
Key: iint/con_u
Exists: True
Not empty: [<pyfr.backends.cuda.provider.CUDAPointwiseKernelProvider._instantiate_kernel.<locals>.PointwiseKernel object at 0x7f0782806780>]
Key: bcint/con_u
Exists: True
Not empty: [<pyfr.backends.cuda.provider.CUDAPointwiseKernelProvider._instantiate_kernel.<locals>.PointwiseKernel object at 0x7f06c77dfe90>, <pyfr.backends.cuda.provider.CUDAPointwiseKernelProvider._instantiate_kernel.<locals>.PointwiseKernel object at 0x7f0704d91640>, <pyfr.backends.cuda.provider.CUDAPointwiseKernelProvider._instantiate_kernel.<locals>.PointwiseKernel object at 0x7f06e23cde50>]
Key: eles/tgradpcoru_upts
Exists: True
Not empty: [<pyfr.backends.cuda.gimmik.CUDAGiMMiKKernels.mul.<locals>.MulKernel object at 0x7f07897c1430>]
Key: mpiint/scal_fpts_unpack
Exists: False
Not empty: False
Key: mpiint/con_u
Exists: False
Not empty: False
Key: eles/tgradcoru_upts
Exists: True
Not empty: [<pyfr.backends.cuda.gimmik.CUDAGiMMiKKernels.mul.<locals>.MulKernel object at 0x7f0704d91af0>]
Key: eles/gradcoru_upts
Exists: False
Not empty: False
Key: eles/gradcoru_fpts
Exists: True
Not empty: [<pyfr.backends.cuda.provider.CUDAUnorderedMetaKernel object at 0x7f06c77cb4d0>]
Is the issue is related to gradcoru_upts? Is there a reason for which it is empty?
Update: even using
# Obtain the physical gradients at the solution points
for l in k['eles/gradcoru_u']:
g2.add(l, deps=deps(l, 'eles/tgradcoru_upts'))
# Interpolate these gradients to the flux points
for l in k['eles/gradcoru_fpts']:
ldeps = deps(l, 'eles/gradcoru_u')
g2.add(l, deps=ldeps)
PyFR performs a lot of optimisations in this area depending on your simulation set up. This often enables us to eliminate the need for a separate gradient kernel in the main RHS evaluation.
Take a look at _compute_grads_graph to see how one can reliably obtain gradients. Note, however, that some more work will be needed to interpolate them as we can sometimes elide the interpolation kernel to the flux points (as is the case if you run with GLL points).
I used the graph from _compute_grads_graph and then added interpolation after that as:
for l in k['eles/gradcoru_fpts']:
ldeps = deps(l, 'eles/gradcoru_u')
g2.add(l, deps=ldeps)
it looks like that when I don’t do gradient fusion results are correct whilst if I do they are not. Hence could you point out to the main differences in the gradient calculation with and without? How the kernel scheduling must be changed? I assume that all the was up to eles/gradcoru_u is correct then it is just the face interpolation that must be changed. Moreover even if I use GLL and don’t use the gradcoru_fpts kernel I still have the same issue
When gradient fusion is enabled the gradients are untransformed as part of the transformed flux kernel. In this case they are put into a separate array. When gradient fusion is disabled they find themselves in the same array as the flux (with the flux kernel overwriting them when it runs).
It is likely your issue is due to the interpolation kernel using the wrong array since eles/gradcoru_u probably puts the gradients into a different matrix than the fused flux kernel.
So to my understanding if gradient fusion is enabled they are in self._grad_upts and I need to get a view over this whilst without gradient fusion self._vect_upts.
So the cases are:
if self.grad_fusion:
# Gradients are stored in self._grad_upts
elif 'flux' in self.antialias:
# Gradients are stored in self._vect_qpts
else:
# No gradient + flux kernel fusion, no flux-AA
# Gradients are stored in self._vect_fpts
so based on those cases I can create the corresponding views?
Without fusion gradients end up in _vect_upts. With fusion they end up in _grad_upts. Anti-aliasing is not something you need to worry about as the gradients still end up in _vect_upts and are then interpolated to the quadrature points.
For GLL points note that the kernel to interpolate to the flux points may be empty due to another optimisation.
There are likely some differences due to how the fluid force handles conversion between primitive and conservative variables. In particular if you convert before or after interpolating to the flux points. Note that the fluid force plugin can also perform quadrature (so it interpolates to quadrature points on the surface rather than flux points).
The fluid force plugin does not use the corrected gradients but rather obtains them by differentiating the solution directly. This is slightly less accurate than using the gradients of the corrected solution.
Yes, we may change the fluid-force plugin in the future to use the more accurate gradients. However, in the majority of simulations the viscous drag is relatively small and so it is likely not worth the additional computational expense.