Problem in repeating sd7003 sim with PyFR v 1.12.1.1

Hello everyone,

I would like to repeat the simulation of the sd7003 airfoil (see https://doi.org/10.1016/j.jcp.2016.12.049) with PyFR 1.12.1, in order to produce a nice movie of the Q-criterion.

I tried using this version of the code a couple of times, once with an output frequency of dt = 5.0 and another one with dt = 1.0. In both cases, I got the code crashing with error RuntimeError: Minimum sized time step rejected, the first after time t = 152.63 and the second after time t = 96.75. In the past, I was able to complete the same simulation up to time t = 260 with PyFR 1.12.0.

I really don’t see why different output frequencies could make the code crashing at different times. Is anyone aware of any change in the code from version 1.12.0 to version 1.12.1 that could bring this odd issue? In addition, was this issue solved in the newest version 1.12.2?

Regards,
Federico Cipolletta.

Try updating to 1.12.2 and let me know if the same issue occurs there. It might be worth using something like git diff just to make sure that the dt in the writer plugin is the only thing that has changed between the config files. Just in case.

Hi @ FedericoCipolletta

Which PyFR version did you use to generate the pyfrm mesh? Which tolerance did you use to select linear elements?

Hello everyone.

@Will, thanks, I will try using version 1.12.2. Unfortunately, I don’t have web access on the cluster that I am using, therefore I cannot make a git diff between the two versions of the source code. However, adopting one brute force technique with diff -ru of the two directories containing the two versions of the source code, I got a lot of differences (as I don’t find a way to attach a file, I don’t think that it would worth to write all of them here). On the other hand, doing diff -u between the config files that I am used, I obtain the following:

--- PyFR_1.12/Vermeire_et_al_2017/sd7003_CompExprt/sd7003.ini   2021-07-02 09:17:50.000000000 +0200
+++ PyFR_1.12.1/Vermeire_et_al_2017/sd7003_new/sd7003.ini       2021-09-23 17:06:02.000000000 +0200
@@ -72,8 +72,32 @@

 [soln-plugin-writer]
-dt-out=10.0
+dt-out=1.0
 basedir = .
-basename = sd7003-{t:.5f}
+basename = sd7003-{t:.2f}

being --- the successful test with version 1.12.0 and +++ the second failed test adopting version 1.12.1. Therefore, I am basically using the same config file, apart from the dt of the output.

@Gonzalo, I used the same version I am using to run the simulation for genereating the mesh (i.e. PyFR 1.12.1 for the latest failed attempts, and PyFR 1.12.0 for the previous successful simulation). I did not explicitly set any tolerance for the linear elements but on both the failed and successful simulations, I used the same parameters that where present in the paper supplemental material. In particular, the latest failing config file, looks like that:

[backend]
precision = double

[backend-cuda]
device-id = local-rank

[constants]
gamma = 1.4
mu    = 3.94405318873308E-6
Pr    = 0.72
M     = 0.2

[solver-time-integrator]
scheme = rk45
controller = pi
tstart=0.0
tend=260.0
dt = 0.00001
atol = 0.000001
rtol = 0.000001
safety-fact = 0.5
min-fact = 0.3
max-fact = 1.2

[solver]
system = navier-stokes
order  = 4

[solver-interfaces]
riemann-solver = rusanov
ldg-beta = 0.5
ldg-tau = 0.1

[solver-interfaces-quad]
flux-pts = gauss-legendre

[solver-elements-hex]
soln-pts = gauss-legendre

[soln-bcs-outlet]
type = char-riem-inv
rho = 1.0
u   = 0.2366431913
v   = 0.0
w   = 0.0
p   = 1.0

[soln-bcs-inlet]
type = char-riem-inv
rho = 1.0
u   = 0.2366431913
v   = 0.0
w   = 0.0
p   = 1.0

[soln-bcs-wall]
type = no-slp-adia-wall
cpTw  = 3.5

[soln-ics]
rho  = 1.0
u    = 0.2366431913
v    = 0.001
w    = 0.001*cos(x)*cos(y)
p    = 1.0

[soln-plugin-writer]
dt-out=1.0
basedir = .
basename = sd7003-{t:.2f}

Looking at the ini file I am a little suspicious of the small safety factor. This normally implies that the simulation is at its stability limit.

As such, I have started re-running the simulation with flux anti-aliasing (degree = 11 Gauss-Legendre points). Thus far it has got to t = 104 without issue. I will let you know if I encounter any issues.

Regards, Freddie.

As a quick follow-up here my anti-aliased simulation is now at

17.2% [====>                    ] 171.75/1000.00 ela: 45:16:59 rem: 218:22:32

when run on 32 V100s, and is causing me no issues what so ever.

Regards, Freddie.

1 Like

Hello Freddie,

thank you for watching into this. I confirm that only adding the following:

[solver-interfaces-line]
flux-pts = gauss-legendre
quad-deg = 11
quad-pts = gauss-legendre

made the simulation reaching t = 260.0 without any issue.

Best,
Federico Cipolletta.

You can speed the simulation up quite a bit by removing the

safety-fact = 0.5
min-fact = 0.3
max-fact = 1.2

block. We are working to reduce the cost of anti-aliasing going forwards.

Regards, Freddie.

Hello everyone,

does anyone have any guess why is I run the same config file adopting openmp backend then I obtain

KeyError: 'Kernel "mul" has no providers'

?

In particular, I left all the parameters unchanged, so I am still using

safety-fact = 0.5
min-fact = 0.3
max-fact = 1.2

and the anti-aliasing given by

[solver-interfaces-line]
flux-pts = gauss-legendre
quad-deg = 11
quad-pts = gauss-legendre

Thank you,
Federico Cipolletta.

It seems that setting

[backend-openmp]
gimmik-max-nnz = 8192

as suggested in https://pyfr.discourse.group/t/keyerror-on-3d-case-running-with-wsl-ubuntu-openmp/404/9, made the simulation ongoing.

What I really cannot understand is the reason: why the cuda-backend would not require to enlarge the number of non-zero elements in a constant matrix for Gimmik?

Any hints would be greatly appreciated!

Best,
Federico Cipolletta.

For the CUDA backend dense matrix multiplications are handled via the CUBLAS library, which ships as part of CUDA. For the OpenMP backend we make use of libxsmm which is currently an optional dependency (although highlighly recommended). Without libxsmm, however, there are some limitations in terms of what simulations can be run.

Further details can be found in the performance tuning guide: Performance Tuning — Documentation

In future versions of PyFR libxsmm will become a hard dependency (as in the OpenMP backend will refuse to run at all if it is not available, we’re just waiting for some final ARM support to be wired up before mandating it).

Regards, Freddie.

1 Like

Thank you, Freddie, that makes sense to me.

I will try to find libxsmm and install that, in order to see if there are some changes…

Hello Freddie,

I am still following up here. Is there any particular parameter that I should set when I want to run with libxsmm installed? In particular, I did one

make PREFIX=<...> STATIC=0 BLAS=0 install

to install libxsmm and produced its relative module file. How can I make PyFR “aware” of the libxsmm installation?

One option is to put the resulting .so file in a directory that is in your library load path. Alternatively, if you know the exact path to the .so file you can do

export PYFR_XSMM_LIBRARY_PATH=/path/to/libxsmm.so

and then run PyFR as usual. It will automatically pick up libxsmm and use it automatically in preference to GiMMiK.

Regards, Freddie.

Hello Freddie,

and thank you for the answer.

Indeed, I already attempted prepending the path to libxsmm.so to my LD_LIBRARY_PATH but that did not make any changes. I also tried exporting the PYFR_XSMM_LIBRARY_PATH (as I read in the thread in https://pyfr.discourse.group/t/pyfr-on-xeon-phi/165/4) but still no changes in the KeyError.

It seems that I am forced to set gimmik-max-nnz = 8192 in the config file to make that simulation running and that is quite strange to my eyes…

Is there any way in which I can check that if I use the libxsmm and set gimmik-max-nnz at the same time, then PyFR will actually use libxsmm for the gemm?

Ok, it seems that I have something…I submitted two jobs: one with only the gimmik parameter and the other indicating the location of libxsmm and the gimmik parameter. Both jobs are run with the openmp backend and using the same resources.

The first job is continuing and reached now time 40.17. The second one died after time 28.43 with the error RuntimeError: Minimum sized time step rejected. If, on the one hand, the different results are suggesting that the two simulations are different (and therefore the second must be using libxsmm), the fact that the second is crashing confuses me: I would expect that libxsmm performs better…

Could that be due to the too-small safety factor that I am using?

Regards,
Federico Cipolletta.

So I ran the case on our cluster from scratch using 320 Intel SKX cores (also 320 ranks for simplicity) using the latest version of libxsmm. Here, I found:

3.9% [=>                       ] 39.41/1000.00 ela: 33:31:44 rem: 817:09:11

with everything working as expected. I was using the default time step parameters, although I would be surprised if this makes a difference. What CPU architecture were you running on?

Regards, Freddie.

Hello Freddie,

I am running on the same architecture on which I had run the simulation with CUDA backend, i.e. eight 2-sockets AMD EPYC 7402 24-Core Processors, using 32 MPI tasks. Although I read the suggestion on the Documentation to be using only one MPI per socket, I wanted to produce a “fair” comparison on our machine to make people understanding the reason of using GPUs: for sure this is not optimal in terms of performance but my objective was using exactly the same resources.

I report here the tests that I did so far:

1) SF 0.5, AA, GPU --> Done in 13:35:44
2) no SF, AA, GPU --> Crashed at time 186.61, due to Minimum sized time step rejected
3) no SF, no AA, GPU --> Done in 14:16:25
3) SF 0.5, AA, CPU, libxsmm + gimmik_max_nnz --> Crashed at time 58.13, due to Minimum sized time step rejected
4) no SF, AA, CPU, libxsmm + gimmik_max_nnz --> restart3 ongoing...  72.1% [+++++++++++++++====>        ] 187.38/260.00 ela: 18:48:39 rem: 34:41:32

Regards,
Federico Cipolletta.

1 Like

Just re-running the case myself on 40 V100’s with AA and no changes to the time stepping (so default safety factor etc.) I find:

47.8% [===========>            ] 477.74/1000.00 ela: 100:20:56 rem: 109:41:56

which the case still running without issue, whereas your results appear to indicate that the simulation diverges. I can not reproduce this behaviour.

Regards, Freddie.

Hello Freddie,

thank you very much for the follow-up. Therefore, if I understand well, you tried my simulation 2) and can continue the simulation without any failure.

The only thing that comes to my mind is that in my simulation 2), there is a chance that I was still exporting the path to libxsmm, although this should not be an issue when running with the CUDA backend.

Now, I don’t have the cluster available but I will try to repeat the simulation 2) as soon as the maintenance will be completed, in order to be sure of what I am doing.

In the meanwhile, can you tell me if you are using PyFR version 1.12.1 or the latest release?

Regards,
Federico Cipolletta.