PyFR plugin in-memory execution model

I now have some semi-production runs and I am looking to optimise collection of statistics, integrals and point sampling information. I was warned that this will be costly in terms of performance, but I was very surprised by how much. The utilisation of GPU dropped from 90% to almost 0% in the first iteration of my setup.

I wanted to understand this a bit better. Do plugins copy data to CPU for post-processing for tavg, sampler and fluidforce?

Practically, I can obviously decrease the frequency of collecting snapshots for averaging and increase the dt-out, so this is not I think an issue, though any advice will be greatly appreciated.

All plugin execution is performed on the CPU and is single threaded. Thus you pay the price of a copy and performance for calculations is typically two orders of magnitude worse (GPU memory bandwidth vs. single core memory bandwidth). See the following guidance in the performance tuning section:

As PyFR is an explicit code which is almost always CFL limited running plugins frequently is wasteful.

Regards, Freddie.