I would like to repeat the simulation of the sd7003 airfoil (see https://doi.org/10.1016/j.jcp.2016.12.049) with PyFR 1.12.1, in order to produce a nice movie of the Q-criterion.
I tried using this version of the code a couple of times, once with an output frequency of dt = 5.0 and another one with dt = 1.0. In both cases, I got the code crashing with error RuntimeError: Minimum sized time step rejected, the first after time t = 152.63 and the second after time t = 96.75. In the past, I was able to complete the same simulation up to time t = 260 with PyFR 1.12.0.
I really don’t see why different output frequencies could make the code crashing at different times. Is anyone aware of any change in the code from version 1.12.0 to version 1.12.1 that could bring this odd issue? In addition, was this issue solved in the newest version 1.12.2?
Try updating to 1.12.2 and let me know if the same issue occurs there. It might be worth using something like git diff just to make sure that the dt in the writer plugin is the only thing that has changed between the config files. Just in case.
@Will, thanks, I will try using version 1.12.2. Unfortunately, I don’t have web access on the cluster that I am using, therefore I cannot make a git diff between the two versions of the source code. However, adopting one brute force technique with diff -ru of the two directories containing the two versions of the source code, I got a lot of differences (as I don’t find a way to attach a file, I don’t think that it would worth to write all of them here). On the other hand, doing diff -u between the config files that I am used, I obtain the following:
being --- the successful test with version 1.12.0 and +++ the second failed test adopting version 1.12.1. Therefore, I am basically using the same config file, apart from the dt of the output.
@Gonzalo, I used the same version I am using to run the simulation for genereating the mesh (i.e. PyFR 1.12.1 for the latest failed attempts, and PyFR 1.12.0 for the previous successful simulation). I did not explicitly set any tolerance for the linear elements but on both the failed and successful simulations, I used the same parameters that where present in the paper supplemental material. In particular, the latest failing config file, looks like that:
[backend]
precision = double
[backend-cuda]
device-id = local-rank
[constants]
gamma = 1.4
mu = 3.94405318873308E-6
Pr = 0.72
M = 0.2
[solver-time-integrator]
scheme = rk45
controller = pi
tstart=0.0
tend=260.0
dt = 0.00001
atol = 0.000001
rtol = 0.000001
safety-fact = 0.5
min-fact = 0.3
max-fact = 1.2
[solver]
system = navier-stokes
order = 4
[solver-interfaces]
riemann-solver = rusanov
ldg-beta = 0.5
ldg-tau = 0.1
[solver-interfaces-quad]
flux-pts = gauss-legendre
[solver-elements-hex]
soln-pts = gauss-legendre
[soln-bcs-outlet]
type = char-riem-inv
rho = 1.0
u = 0.2366431913
v = 0.0
w = 0.0
p = 1.0
[soln-bcs-inlet]
type = char-riem-inv
rho = 1.0
u = 0.2366431913
v = 0.0
w = 0.0
p = 1.0
[soln-bcs-wall]
type = no-slp-adia-wall
cpTw = 3.5
[soln-ics]
rho = 1.0
u = 0.2366431913
v = 0.001
w = 0.001*cos(x)*cos(y)
p = 1.0
[soln-plugin-writer]
dt-out=1.0
basedir = .
basename = sd7003-{t:.2f}
Looking at the ini file I am a little suspicious of the small safety factor. This normally implies that the simulation is at its stability limit.
As such, I have started re-running the simulation with flux anti-aliasing (degree = 11 Gauss-Legendre points). Thus far it has got to t = 104 without issue. I will let you know if I encounter any issues.
What I really cannot understand is the reason: why the cuda-backend would not require to enlarge the number of non-zero elements in a constant matrix for Gimmik?
For the CUDA backend dense matrix multiplications are handled via the CUBLAS library, which ships as part of CUDA. For the OpenMP backend we make use of libxsmm which is currently an optional dependency (although highlighly recommended). Without libxsmm, however, there are some limitations in terms of what simulations can be run.
In future versions of PyFR libxsmm will become a hard dependency (as in the OpenMP backend will refuse to run at all if it is not available, we’re just waiting for some final ARM support to be wired up before mandating it).
One option is to put the resulting .so file in a directory that is in your library load path. Alternatively, if you know the exact path to the .so file you can do
export PYFR_XSMM_LIBRARY_PATH=/path/to/libxsmm.so
and then run PyFR as usual. It will automatically pick up libxsmm and use it automatically in preference to GiMMiK.
Indeed, I already attempted prepending the path to libxsmm.so to my LD_LIBRARY_PATH but that did not make any changes. I also tried exporting the PYFR_XSMM_LIBRARY_PATH (as I read in the thread in https://pyfr.discourse.group/t/pyfr-on-xeon-phi/165/4) but still no changes in the KeyError.
It seems that I am forced to set gimmik-max-nnz = 8192 in the config file to make that simulation running and that is quite strange to my eyes…
Is there any way in which I can check that if I use the libxsmm and set gimmik-max-nnz at the same time, then PyFR will actually use libxsmm for the gemm?
Ok, it seems that I have something…I submitted two jobs: one with only the gimmik parameter and the other indicating the location of libxsmm and the gimmik parameter. Both jobs are run with the openmp backend and using the same resources.
The first job is continuing and reached now time 40.17. The second one died after time 28.43 with the error RuntimeError: Minimum sized time step rejected. If, on the one hand, the different results are suggesting that the two simulations are different (and therefore the second must be using libxsmm), the fact that the second is crashing confuses me: I would expect that libxsmm performs better…
Could that be due to the too-small safety factor that I am using?
So I ran the case on our cluster from scratch using 320 Intel SKX cores (also 320 ranks for simplicity) using the latest version of libxsmm. Here, I found:
with everything working as expected. I was using the default time step parameters, although I would be surprised if this makes a difference. What CPU architecture were you running on?
I am running on the same architecture on which I had run the simulation with CUDA backend, i.e. eight 2-sockets AMD EPYC 7402 24-Core Processors, using 32 MPI tasks. Although I read the suggestion on the Documentation to be using only one MPI per socket, I wanted to produce a “fair” comparison on our machine to make people understanding the reason of using GPUs: for sure this is not optimal in terms of performance but my objective was using exactly the same resources.
I report here the tests that I did so far:
1) SF 0.5, AA, GPU --> Done in 13:35:44
2) no SF, AA, GPU --> Crashed at time 186.61, due to Minimum sized time step rejected
3) no SF, no AA, GPU --> Done in 14:16:25
3) SF 0.5, AA, CPU, libxsmm + gimmik_max_nnz --> Crashed at time 58.13, due to Minimum sized time step rejected
4) no SF, AA, CPU, libxsmm + gimmik_max_nnz --> restart3 ongoing... 72.1% [+++++++++++++++====> ] 187.38/260.00 ela: 18:48:39 rem: 34:41:32
thank you very much for the follow-up. Therefore, if I understand well, you tried my simulation 2) and can continue the simulation without any failure.
The only thing that comes to my mind is that in my simulation 2), there is a chance that I was still exporting the path to libxsmm, although this should not be an issue when running with the CUDA backend.
Now, I don’t have the cluster available but I will try to repeat the simulation 2) as soon as the maintenance will be completed, in order to be sure of what I am doing.
In the meanwhile, can you tell me if you are using PyFR version 1.12.1 or the latest release?