How I added a kernel to the pseudo stepper

I recently tried out adding the Turkel preconditioning to artificial compressibility method (ACM) solver, and I thought the process of adding a pointwise kernel to PyFR would be of interest to people in the community.

The preconditioner

Firstly let me describe what I’m trying to do. For ACM we use pseudo-time to converge the spatial system and an implicit time stepper. So for each pseudo step we’re solving:

\frac{\partial \mathbf{u}}{\partial \tau} = \mathbf{R}(\mathbf{u})

where \tau is the pseudo-time variable, and \mathbf{R}(\mathbf{u}) is the residual that combines the flux divergence and the physical time scheme. What the Turkel preconditioner does is this:

\frac{\partial \mathbf{u}}{\partial \tau} = \mathbf{P}^{-1}(\mathbf{u})\mathbf{R}(\mathbf{u})

where, given the way PyFR implements ACM, leads to the the matrix in 2D:

\mathbf{P}^{-1} = \begin{bmatrix} 1 & 0 & 0 \\ -\alpha u & 1 & 0 \\ -\alpha v & 0 &1 \end{bmatrix}

So the kernel that I want to produce takes in \mathbf{u} and \mathbf{R}(\mathbf{u}) and returns the \mathbf{P}^{-1}\mathbf{R}(\mathbf{u}) all in a pointwise manner on the solution points.

The python side

To begin with we need to add a few things to the python side. We will first register the new kernel by adding self._be.pointwise.register('pyfr.solvers.aceuler.kernels.precond') to def set_backend in the ACEulerElements class. (A good way to find this is the grep-like tool ack, its really good at finding stuff in source). This registers a kernel internally and tells PyFR where to look for the mako source code, which we will look at in the next section.

The next thing to add is some initialisation in the BaseDualPseudoStepper class. For this I added:

   def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)       

        sect = 'solver-time-integrator'

        self.precond = self.cfg.get(sect, 'preconditioner', None)
        tplargs = dict(ndims=self.system.ndims, nvars=self.system.nvars, 
                       precond=self.precond, c=self.c
                      )

        if not self.precond=='None':
            for ele, shape in zip(self.system.ele_map.values(),
                                       self.system.ele_shapes):
                # Append the precond kernels to the proxylist
                self.pintgkernels['precond'].append(
                    self.backend.kernel('precond', tplargs=tplargs,
                        dims=[ele.nupts, ele.neles], 
                        u=ele.scal_upts_inb, f=ele.scal_upts_outb
                    )
                )

            self.backend.commit()

This does a couple of things, first sect is the section of the ini file that we will look in for setup information. Then we look to see if a precondition was requested by the user, and then we make the dictionary tplargs. The next section will then loop through the different element types and generate the kernel. You can see we are making a kernel whose name in PyFR is precond and this done by appending a kernel found by looking in place we registered for a mako kernel called precond. When we do this append, we pass in the element-specific information as well as the template argument we stored in tplargs. These are arguments that get passed to the bit of PyFR that will turn the mako into the source.

You can also see that we pass u=ele.scal_upts_inb and f=ele.scal_upts_outb, the u= and f= has to match with the kernel we are yet to define, and what we set these to are effectively a pointer to the buffers where this information is stored. PyFR uses an interesting system of registers that allows memory to be reused, and ele.scal_upts_inb will be an integer number of a register. You pass any integer you like to u and f, but we want \mathbf{u} and \mathbf{R}(\mathbf{u}), and the special, pre-activated registers scal_upts_inb and scal_upts_outb is almost always where you can find the current progress on these two things.

That is the most difficult part done, so now all we do is run that kernel in the place where we need to. In this case, we add this:

        if not self.precond=='None':
            self._queue.enqueue_and_run(self.pintgkernels['precond'])

to the end of the function def _rhs_with_dts defined in the class BaseDualPseudoStepper. The first line runs an inelegant check on whether preconditioning is wanted, then the second adds the precond kernel to the queue and runs it.

There is one more thing we have to do on the python side and that is adding the config file (cfg) and constants (c) to the BaseDualPseudoIntegrator class, as it wasn’t already stored there. So we just add this line to the __init__ of that class:

self.cfg = cfg
self.c = cfg.items_as('constants', float)

The mako side

What we just did was add the stuff to PyFR that will compile and run a kernel, but now we need to define that kernel in PyFRs domain-specific language. Which sound worse than it is, but this is the easier bit. When we registered the kernel we told it where to look for the kernel mako, that was in pyfr/solvers/aceuler/kernels/ and in a file called precond.mako. This is what my precond.mako looks like:

# -*- coding: utf-8 -*-
<%inherit file='base'/>
<%namespace module='pyfr.backends.base.makoutil' name='pyfr'/>

<%include file='pyfr.solvers.aceuler.kernels.preconds.${precond}'/>

<%pyfr:kernel name='precond' ndim='2'
			  u='in fpdtype_t[${str(nvars)}]'
              f='inout fpdtype_t[${str(nvars)}]'>
    fpdtype_t f_c[${nvars}];

% for i in range(nvars):
	f_c[${i}] = f[${i}];
% endfor
	${pyfr.expand('localprecond', 'u', 'f_c', 'f')};	
</%pyfr:kernel>

PyFR uses the package mako, an advanced templating language, to turn effectively what is pseudo-code into valid and optimised source for the hardware backend requested. To that end, the first three lines are required and get the PyFR/mako side of things set up. The fifth line is an include, this works sort of like a #include "thing" in C, but here precond is a parameter that we passed in tplargs. This means that what was written in the ini file will change what is included, and allows us to add different PyFR macros. This is the same system that we use to change the Riemann solvers.

Line seven is then the kernel header. Here we define the name, which has to match that used in the append earlier. dims here is the dimensionality of the kernel, a pointwise kernel works on data that is [n element]x[n points] big, so 2. An interface kernel on the other hand just works on a 1D list of interface points, so dims would be 1. We then define the inputs and output variables used, where ${str(nvars)} declares that on a pointwise basis u is a vector with width nvars, again passed through tplargs. Throughout the mako side of things, the type used for floating-point data is fpdtype_t which the floating-point data type in the templates and is controlled by precision in the ini.

The next lines declare a local variable, f_c, set it based on the value of f using a loop, where % tells use that the loop is done on the mako side. You can do loops using the C syntax without the %, which might be suited to complex expressions, but for something like this the mako version is better as it will take care of unrolling it. The ndims here is different to dims, with the former being the dimensionality of the physical problem set by the mesh, and once again passed by tplargs. Then the penultimate line calls the macro we included at the beginning and ‘passes’ u, f_c and f.

Lets now look at the macro, which you might be able to tell from the include I put in a new sub-directory pyfr/solvers/aceuler/kernels/preconds. The advantage of this include method with ${precond} is it allows for other preconditioners to be added later.

The macro itself is a file called turkel.mako and looks like this:

# -*- coding: utf-8 -*-
<%namespace module='pyfr.backends.base.makoutil' name='pyfr'/>

<%pyfr:macro name='localprecond' params='u, fi, fo'>

	fo[0] = fi[0];
% for i in range(ndims):
	fo[${i + 1}] = fi[${i + 1}] - ${c['ac-alpha']}*u[${i + 1}]*fi[0];
% endfor

</%pyfr:macro>

Again the first couple of lines set up PyFR/mako and then there is the macro header. This calls the macro localprecond and you can see this is the name of the macro I called in the kernel. It then defines the arguments. The macro itself does some bits to the data, again using some mako loops. You can also see that I have a term ${c['ac-alpha']}, this is a constant I added to the ini file and was read in earlier in the BaseDualPseudoIntegrator class. When the source is built at run time, the value in the ini file will get substituted here. The last line closes the macro.

Finishing up

To get the code working you have to do some final things.

  1. As we added a new directory it will need a file called __init__.py, which in this case will be blank.
  2. We also have to add the new directory to the paths searched by setup.py, which looking in setup.py is pretty self explainitory.
  3. Then you have to run python setup.py install in the base directory
  4. Finally you need to run it. From how we implemented this it’ll have to be on an ac-euler case and without p-multigrid. Then in the ini add preconditioner = turkel under the section [solver-time-integrator], and add ac-alpha = 0.1 under the section [constants].

Hopefully, you’ve made it this far without getting lost, and you might even have added a preconditioner to your fork of PyFR. If you have any questions feel free to ask and I will try to keep this text up to date with any changes/corrections that come about.

And for those of you that are interested in how the preconditioner panned out, it was a solid maybe in terms of performance.