Bad Performance and warning

Hello,

my simulation has a bad performance and i wonder why. I get estimated calculation times up to 38000 hours.

Some general information:
The used hardware are: 20 Tesla V100-SXM2
The used PyFR version is 1.11.0 with the Turbulence generator plugin
I am calculating mesh order 4
I am restarting the job with a calculation from mesh order 2
I calculate a mesh with 3.92924e+06 Nodes, 61846 Quadrangles and 475575 Hexahedra.

The Bashjob is setup is:

#SBATCH -J Mesh_Coarse
#SBATCH -o Output.%J
#SBATCH --ntasks=20
#SBATCH --ntasks-per-node=2
#SBATCH --gres=gpu:2
#SBATCH -t 0-24:00:00


module load cuda/101
module load cudnn/7.6.5


module switch intel intel/2021.2.0

  
module switch intelmpi intelmpi/2021.2  
 
module load python/3.8.7

source new_ENV/bin/activate
export LD_LIBRARY_PATH=$LD_LIBRARY_PATH:/usr/local_rwth/sw/cuda/10.0.130/lib64


pyfr partition 20 Mesh-FINAL.pyfrm Mesh-CONV-140.00.pyfrs .

mpiexec -n 20 pyfr restart -b cuda -p Mesh-FINAL.pyfrm Mesh-CONV-140.00.pyfrs Mesh-NEW-VERSION.ini

Before the Simulation starts i get some warnings and it might be because of them.
The Warning are:

srun: TOPOLOGY: warning -- no switch can reach all nodes through its descendants.Do not use route/topology
slurmstepd: error: Can't propagate RLIMIT_NPROC of 767556 from submit host: Operation not permitted
slurmstepd: error: Can't propagate RLIMIT_NPROC of 767556 from submit host: Operation not permitted
slurmstepd: error: Can't propagate RLIMIT_NPROC of 767556 from submit host: Operation not permitted
slurmstepd: error: Can't propagate RLIMIT_NPROC of 767556 from submit host: Operation not permitted

The Full log is:

(OK) Loading cuda 10.1.243
(OK) Loading cudnn 7.6.5
(OK) Unloading cudnn 7.6.5
(OK) Unloading cuda 10.1.243
(OK) Unloading intelmpi 2018.4.274
(OK) Unloading Intel Suite 19.0.1.144
(OK) Loading Intel suite, Version 2021.2.0 (Classic): compilers (C/C++/FORTRAN), MKL, TBB, IPP(CP) DAL=DAAL, VPL, DPL and DPCPP-CT, DNNL, CCL and extended gdb
(!!) Marketing name: Intel(R) oneAPI Compiler 2021 2021.2.0
(OK) Intel MPI Suite 2018.4.274 loaded.
(OK) Loading cuda 10.1.243
(OK) Loading cudnn 7.6.5
(OK) Unloading cudnn 7.6.5
(OK) Unloading cuda 10.1.243
(OK) Unloading intelmpi 2018.4.274
(OK) Intel MPI Suite 2021.2.0 loaded.
(OK) Loading cuda 10.1.243
(OK) Loading cudnn 7.6.5
(OK) Loading python 3.8.7
(!!) The SciPy Stack is available: http://www.scipy.org/stackspec.html
 Built with GCC compilers.
srun: TOPOLOGY: warning -- no switch can reach all nodes through its descendants.Do not use route/topology
slurmstepd: error: Can't propagate RLIMIT_NPROC of 767556 from submit host: Operation not permitted
slurmstepd: error: Can't propagate RLIMIT_NPROC of 767556 from submit host: Operation not permitted
slurmstepd: error: Can't propagate RLIMIT_NPROC of 767556 from submit host: Operation not permitted
slurmstepd: error: Can't propagate RLIMIT_NPROC of 767556 from submit host: Operation not permitted
e[2Ke[G   0.0% [>                        ] 140.00/240.00 ela: 00:00:01 rem: 34950:05:49e[2Ke[G   0.0% [>                        ] 140.00/240.00 ela: 00:00:01 rem: 13810:23:31e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:02 rem: 8387:38:45e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:02 rem: 5382:37:38e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:03 rem: 3843:37:46e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:03 rem: 3092:14:51e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:04 rem: 2748:40:24e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:04 rem: 2512:45:33e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:05 rem: 2325:19:50e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:05 rem: 2210:22:52e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:06 rem: 2103:42:28e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:06 rem: 2020:10:08e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:07 rem: 1961:02:20e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:07 rem: 1906:43:14e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:08 rem: 1866:04:02e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:08 rem: 1829:26:06e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:09 rem: 1798:58:15e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:09 rem: 1770:57:22e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:10 rem: 1751:06:39e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:10 rem: 1725:50:46e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:11 rem: 1707:08:57e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:11 rem: 1698:17:48e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:12 rem: 1683:44:30e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:12 rem: 1674:09:57e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:13 rem: 1662:31:15e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:13 rem: 1646:34:43e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:14 rem: 1643:17:27e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:14 rem: 1634:08:15e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:15 rem: 1631:09:05e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:15 rem: 1619:18:54e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:16 rem: 1609:56:14e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:16 rem: 1600:20:03e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:17 rem: 1593:11:02e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:17 rem: 1590:25:10e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:18 rem: 1583:12:55e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:18 rem: 1575:43:28e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:19 rem: 1570:43:48e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:19 rem: 1567:40:05e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:20 rem: 1561:43:18e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:20 rem: 1557:20:57e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:21 rem: 1554:54:22e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:21 rem: 1549:34:36e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:22 rem: 1547:36:36e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:23 rem: 1545:37:14e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:23 rem: 1543:39:56e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:24 rem: 1538:40:19e[2Ke[G   0.0% [>                         ] 140.00/240.00 ela: 00:00:24 rem: 1535:34:38e[2Ke[......

Thank you and Kind Regards

Peter

The warnings appear to be spurious. However, it may be worth running the OSU Micro-Benchmarks to ensure that you are getting reasonable bandwidth between nodes.

If you are getting ‘bad performance’ or not depends very much on the Reynolds number and the grid spacing. A small grid can easily hurt your timestep.

Regards, Freddie.

Peter, could you provide a few more details on what you mean by poor performance?

  • What Reynolds number are you intending the simulation to be at?
  • What system are you solving? (Euler, Navier–Stokes, ACM?)
  • What is your minimum mesh spacing?
  • What is the performance you are getting in terms of time per DoF per RK stage?

Also here is a link to the benchmarks Freddie mentioned: [MPI micro-benchmarks].(MVAPICH :: Benchmarks)

Will T, sure!

The Reynoldsnumber is 120000
The system for solving is Navier-Stokes
Well the minimal cell spacing near the wing wall is 0.00153125 which i assume is the minimal mesh spacing
What exactly do you mean with time per DoF per RK stage? The DoF would be ~ 67 Mio and scheme = rk45

What is mean is what is the result of the following equation:

\frac{\text{runtime}}{\text{DoF}\times\text{No. time steps}\times\text{No. GPUs}}

Note I have added the normalisation by the number of GPUs. For the number of time steps, when using an RK method, include the substeps in this. I am interested in getting a comparable performance metric, although this metric will probably be dependent on how strong scaled you are, it will give an indication.

As an example say you did a run that took 4 hours and it had 10^7 DoFs taking 10^6 RK3 times steps on 10 GPUs: then you would have:

\frac{4\times3600}{10\times{10^7}\times3\times{10^6}\quad} = 4.8\times10^{-11} s/DoF/Step/GPU

(These numbers are completely fictional and not from a run I have performed)

Well the Data are:
runtime: 13260
timesteps: 100
Gpus: 20 or 20*5120 cores
DoF: 67 Mio

I can only take the estimate of the runtime but with the estimate it would be:

\frac{13260*3600}{67000000*100*20} = 3.562388 × 10^{-4}

Well if you take gpus as cuda cores it would be:

\frac{13260*3600}{67000000*100*20*5120} = 6.96 × 10^{-8}

For comparison, here are the stats for a job I am currently running:

  • 40 GPUs (Nvidia A100 PCIe 40GB)
  • runtime: 245739.19s
  • time steps: 9500951 RK4
  • DoF: 15353240 (~767k p=3 tets)
    So:
\frac{245739.19}{(15.35\times10^{6})\times(4\times 9.5\times10^6)\times40\quad\quad} = 1\times10^{-11} s/DoF/Step/GPU.

In your numerator, you multiplied the runtime by 3600, which implies it took 13260 hours to do 100 steps. I assume this was an error. Either way, it definitely looks like your performance is a bit off. (FYI you can get some of these numbers from the solution file using $ h5dump -d stats some_solution_file.pyfrs)

Have you tried the communication benchmarks mentioned earlier? It could be that MPI is using ethernet rather than Infiniband. When I run the bi-directional bandwidth test is output I get:

# OSU MPI Bi-Directional Bandwidth Test v5.7
# Size      Bandwidth (MB/s)
1                       6.99
2                      13.86
4                      28.47
8                      56.93
16                     99.45
32                    224.37
64                    392.49
128                   777.44
256                   455.68
512                  2027.76
1024                 3469.51
2048                 6220.84
4096                12441.01
8192                17166.06
16384               18128.32
32768               18728.46
65536               21558.36
131072              22979.04
262144              23502.79
524288              23810.20
1048576             23952.41
2097152             24017.92
4194304             24060.09

Which I think is about where it should be for our hardware.

What order are you running? What if any GiMMiK settings do you have in the cuda-backend in your ini file?

Also, did you get a chance to have a look at the output from the dtstats plugin? If you are using adaptive time-stepping this will let you know what the time step is. It could be that something like large gradients in the initial transients of the solution could be killing your time step.

If so reducing the order, or reducing the Reynolds number by a factor of ~10 for a few flow throughs can be a useful strategy.

I already tried to calculate first with 1st and 2nd order and then 4th order. I didn’t try to reduce the reynolds number because i think it would have the same effect or would it change something?

dstats says dt is 1E-05
The output of the dstats file is:

n	t	dt	action	error
0	140.0	1E-06	accept	2.268343849514453e-08
1	140.000001	2.4999999999999998e-06	accept	8.471459234561273e-07
2	140.0000035	3.973820035165348e-06	accept	5.0772548825855345e-06
3	140.00000747382003	6.631456658704302e-06	accept	3.561642476002428e-05
4	140.0000141052767	9.412890634201496e-06	accept	0.00012322010278827724
5	140.00002351816735	1E-05	accept	0.00012695166284600783
6	140.00003351816736	1E-05	accept	0.00010171444860315467
7	140.00004351816736	1E-05	accept	8.157251290649215e-05
8	140.00005351816736	1E-05	accept	6.545558976709211e-05
9	140.00006351816737	1E-05	accept	5.256137687605738e-05
10	140.00007351816737	1E-05	accept	4.225747745502012e-05
11	140.00008351816737	1E-05	accept	3.404802337486615e-05
12	140.00009351816738	1E-05	accept	2.7548663607508372e-05
13	140.00010351816738	1E-05	accept	2.2459459667887144e-05
14	140.00011351816738	1E-05	accept	1.8535164453912925e-05
15	140.00012351816738	1E-05	accept	1.5575482055283676e-05
16	140.0001335181674	1E-05	accept	1.3402368327508011e-05
17	140.0001435181674	1E-05	accept	1.1855368887114922e-05
18	140.0001535181674	1E-05	accept	1.0784616655945415e-05
19	140.0001635181674	1E-05	accept	1.004884203274464e-05
20	140.0001735181674	1E-05	accept	9.533609412264023e-06
21	140.0001835181674	1E-05	accept	9.155385415727607e-06
22	140.0001935181674	1E-05	accept	8.853631703921871e-06
23	140.0002035181674	1E-05	accept	8.588866610849027e-06
24	140.0002135181674	1E-05	accept	8.33971018660455e-06
25	140.00022351816742	1E-05	accept	8.091478330769144e-06
26	140.00023351816742	1E-05	accept	7.842543073038163e-06
27	140.00024351816742	1E-05	accept	7.585254034626698e-06
28	140.00025351816743	1E-05	accept	7.326006796554221e-06
29	140.00026351816743	1E-05	accept	7.066663336625013e-06
30	140.00027351816743	1E-05	accept	6.803767214359973e-06
31	140.00028351816744	1E-05	accept	6.547402536895176e-06
32	140.00029351816744	1E-05	accept	6.296152274574695e-06
33	140.00030351816744	1E-05	accept	6.0526265826975125e-06
34	140.00031351816745	1E-05	accept	5.8202212144594195e-06
35	140.00032351816745	1E-05	accept	5.596903542350182e-06
36	140.00033351816745	1E-05	accept	5.385615405183631e-06
37	140.00034351816745	1E-05	accept	5.187518298593294e-06
38	140.00035351816746	1E-05	accept	5.00199979901864e-06
39	140.00036351816746	1E-05	accept	4.831365653222026e-06
40	140.00037351816746	1E-05	accept	4.671450461683504e-06
41	140.00038351816747	1E-05	accept	4.525449159286468e-06
42	140.00039351816747	1E-05	accept	4.393592094095203e-06
43	140.00040351816747	1E-05	accept	4.2744342603300585e-06
44	140.00041351816748	1E-05	accept	4.17006497242881e-06
45	140.00042351816748	1E-05	accept	4.073455806612225e-06
46	140.00043351816748	1E-05	accept	3.992194342388462e-06
47	140.0004435181675	1E-05	accept	3.917813238071675e-06
48	140.0004535181675	1E-05	accept	3.856012855864407e-06
49	140.0004635181675	1E-05	accept	3.8023904845039465e-06
50	140.0004735181675	1E-05	accept	3.757474573574061e-06
51	140.0004835181675	1E-05	accept	3.7201105386396755e-06
52	140.0004935181675	1E-05	accept	3.68802154401982e-06
53	140.0005035181675	1E-05	accept	3.661513678357816e-06
54	140.0005135181675	1E-05	accept	3.63889754923234e-06
55	140.0005235181675	1E-05	accept	3.6191075362622008e-06
56	140.00053351816752	1E-05	accept	3.601674837685863e-06
57	140.00054351816752	1E-05	accept	3.5839128370089887e-06
58	140.00055351816752	1E-05	accept	3.5667645423473235e-06
59	140.00056351816752	1E-05	accept	3.5488116913383473e-06
60	140.00057351816753	1E-05	accept	3.529241210802357e-06
61	140.00058351816753	1E-05	accept	3.5077201286861298e-06
62	140.00059351816753	1E-05	accept	3.4834529932468252e-06
63	140.00060351816754	1E-05	accept	3.4563541677069235e-06
64	140.00061351816754	1E-05	accept	3.425785388386914e-06
65	140.00062351816754	1E-05	accept	3.392189633966429e-06
66	140.00063351816755	1E-05	accept	3.35442217818038e-06
67	140.00064351816755	1E-05	accept	3.313407343913269e-06
68	140.00065351816755	1E-05	accept	3.269032724435402e-06
69	140.00066351816756	1E-05	accept	3.2211005778101686e-06
70	140.00067351816756	1E-05	accept	3.170106257720338e-06
71	140.00068351816756	1E-05	accept	3.1162394231889988e-06
72	140.00069351816757	1E-05	accept	3.0601406657294294e-06
73	140.00070351816757	1E-05	accept	3.0017338518285237e-06
74	140.00071351816757	1E-05	accept	2.941780234234165e-06
75	140.00072351816758	1E-05	accept	2.8805691647577044e-06
76	140.00073351816758	1E-05	accept	2.8189792294548986e-06
77	140.00074351816758	1E-05	accept	2.7572356047909497e-06
78	140.00075351816758	1E-05	accept	2.695368285698842e-06
79	140.0007635181676	1E-05	accept	2.63474805479417e-06
80	140.0007735181676	1E-05	accept	2.5755379245414134e-06
81	140.0007835181676	1E-05	accept	2.5182463608442482e-06
82	140.0007935181676	1E-05	accept	2.463456674420906e-06
83	140.0008035181676	1E-05	accept	2.4112808854293277e-06
84	140.0008135181676	1E-05	accept	2.362464719710611e-06
85	140.0008235181676	1E-05	accept	2.317529520524177e-06
86	140.0008335181676	1E-05	accept	2.275394029523447e-06
87	140.0008435181676	1E-05	accept	2.2378431185701784e-06
88	140.00085351816762	1E-05	accept	2.204319881538104e-06
89	140.00086351816762	1E-05	accept	2.174911671322891e-06
90	140.00087351816762	1E-05	accept	2.1492011205294595e-06
91	140.00088351816763	1E-05	accept	2.127890499347433e-06
92	140.00089351816763	1E-05	accept	2.110274533078545e-06
93	140.00090351816763	1E-05	accept	2.0949061369948176e-06
94	140.00091351816764	1E-05	accept	2.0832548938943188e-06
95	140.00092351816764	1E-05	accept	2.0740342026719666e-06
96	140.00093351816764	1E-05	accept	2.067126014710587e-06
97	140.00094351816765	1E-05	accept	2.0620526177696434e-06
98	140.00095351816765	1E-05	accept	2.0584684491859844e-06
99	140.00096351816765	1E-05	accept	2.0551662865396414e-06
100	140.00097351816765	1E-05	accept	2.0527884906498704e-06
101	140.00098351816766	1E-05	accept	2.0506591188425435e-06
102	140.00099351816766	1E-05	accept	2.048120550220234e-06
103	140.00100351816766	1E-05	accept	2.044793562693108e-06
104	140.00101351816767	1E-05	accept	2.0408391806807196e-06
105	140.00102351816767	1E-05	accept	2.0360207737114572e-06
106	140.00103351816767	1E-05	accept	2.0296898545492272e-06
107	140.00104351816768	1E-05	accept	2.021660625425394e-06
108	140.00105351816768	1E-05	accept	2.0123343226514534e-06
109	140.00106351816768	1E-05	accept	2.001248639436717e-06
110	140.0010735181677	1E-05	accept	1.9885802569028713e-06
111	140.0010835181677	1E-05	accept	1.974015984562714e-06
112	140.0010935181677	1E-05	accept	1.957679983789482e-06
113	140.0011035181677	1E-05	accept	1.93994339226657e-06
114	140.0011135181677	1E-05	accept	1.920328424880108e-06
115	140.0011235181677	1E-05	accept	1.8995779516414902e-06
116	140.0011335181677	1E-05	accept	1.8772242819354487e-06
117	140.0011435181677	1E-05	accept	1.8539729199045281e-06
118	140.0011535181677	1E-05	accept	1.8296890420723172e-06
119	140.00116351816771	1E-05	accept	1.8045995012338807e-06
120	140.00117351816772	1E-05	accept	1.7790491017350643e-06
121	140.00118351816772	1E-05	accept	1.7535070647094316e-06
122	140.00119351816772	1E-05	accept	1.727224372225263e-06
123	140.00120351816773	1E-05	accept	1.7016086145819193e-06
124	140.00121351816773	1E-05	accept	1.6760852789031668e-06
125	140.00122351816773	1E-05	accept	1.6513834749621381e-06
126	140.00123351816774	1E-05	accept	1.6273906260711213e-06
127	140.00124351816774	1E-05	accept	1.604651061085514e-06
128	140.00125351816774	1E-05	accept	1.5830497422688764e-06
129	140.00126351816775	1E-05	accept	1.5633093401494523e-06
130	140.00127351816775	1E-05	accept	1.5442571160498352e-06
131	140.00128351816775	1E-05	accept	1.5271040966049717e-06
132	140.00129351816776	1E-05	accept	1.5115812991448607e-06
133	140.00130351816776	1E-05	accept	1.4979424560244747e-06
134	140.00131351816776	1E-05	accept	1.4857866135092314e-06
135	140.00132351816777	1E-05	accept	1.4764755252087284e-06
136	140.00133351816777	1E-05	accept	1.4666890214783267e-06
137	140.00134351816777	1E-05	accept	1.4592722709275804e-06
138	140.00135351816778	1E-05	accept	1.4531807941393754e-06
139	140.00136351816778	1E-05	accept	1.4482412639907228e-06
140	140.00137351816778	1E-05	accept	1.4442806310357279e-06
141	140.00138351816778	1E-05	accept	1.4409271115878108e-06
142	140.0013935181678	1E-05	accept	1.4382810875741271e-06
143	140.0014035181678	1E-05	accept	1.4361762257610942e-06
144	140.0014135181678	1E-05	accept	1.434376119332722e-06
145	140.0014235181678	1E-05	accept	1.43207815256095e-06
146	140.0014335181678	1E-05	accept	1.4306000038287125e-06
147	140.0014435181678	1E-05	accept	1.4277086828537786e-06
148	140.0014535181678	1E-05	accept	1.424803673624337e-06
149	140.0014635181678	1E-05	accept	1.4214669970984372e-06
150	140.0014735181678	1E-05	accept	1.4169530037778965e-06
151	140.00148351816782	1E-05	accept	1.4119839020029043e-06
152	140.00149351816782	1E-05	accept	1.4061122131562989e-06
153	140.00150351816782	1E-05	accept	1.3993356404221627e-06
154	140.00151351816783	1E-05	accept	1.3915625414052378e-06
155	140.00152351816783	1E-05	accept	1.3824885854798892e-06
156	140.00153351816783	1E-05	accept	1.3748749353079653e-06
157	140.00154351816784	1E-05	accept	1.3620012372224594e-06
158	140.00155351816784	1E-05	accept	1.3502407631250773e-06
159	140.00156351816784	1E-05	accept	1.3377510038891897e-06
160	140.00157351816785	1E-05	accept	1.3248199153597621e-06
161	140.00158351816785	1E-05	accept	1.310228949981076e-06
162	140.00159351816785	1E-05	accept	1.2957796464140257e-06
163	140.00160351816785	1E-05	accept	1.2804468243453011e-06
164	140.00161351816786	1E-05	accept	1.2646918198048907e-06
165	140.00162351816786	1E-05	accept	1.2488455398062525e-06
166	140.00163351816786	1E-05	accept	1.232816678391334e-06
167	140.00164351816787	1E-05	accept	1.2167841958414198e-06
168	140.00165351816787	1E-05	accept	1.2014748080151373e-06
169	140.00166351816787	1E-05	accept	1.1848791189084275e-06
170	140.00167351816788	1E-05	accept	1.1696955901690306e-06
171	140.00168351816788	1E-05	accept	1.1547387763026817e-06
172	140.00169351816788	1E-05	accept	1.1408189255950126e-06
173	140.0017035181679	1E-05	accept	1.1274033711201898e-06
174	140.0017135181679	1E-05	accept	1.1148062153240423e-06
175	140.0017235181679	1E-05	accept	1.1032112323702101e-06
176	140.0017335181679	1E-05	accept	1.092511782600043e-06
177	140.0017435181679	1E-05	accept	1.0832401121019752e-06
178	140.0017535181679	1E-05	accept	1.0745195832448745e-06
179	140.0017635181679	1E-05	accept	1.0672171660199893e-06
180	140.0017735181679	1E-05	accept	1.0613745001209287e-06
181	140.0017835181679	1E-05	accept	1.0553335441340095e-06
182	140.00179351816791	1E-05	accept	1.0512461374605466e-06
183	140.00180351816792	1E-05	accept	1.0476525363327384e-06
184	140.00181351816792	1E-05	accept	1.0450431862670186e-06
185	140.00182351816792	1E-05	accept	1.0432749264585226e-06
186	140.00183351816793	1E-05	accept	1.0420085228209794e-06
187	140.00184351816793	1E-05	accept	1.0415763539045984e-06
188	140.00185351816793	1E-05	accept	1.0410503136616335e-06
189	140.00186351816794	1E-05	accept	1.0411174549306464e-06
190	140.00187351816794	1E-05	accept	1.0413498103852875e-06
191	140.00188351816794	1E-05	accept	1.0416106726904863e-06
192	140.00189351816795	1E-05	accept	1.0416842277222533e-06
193	140.00190351816795	1E-05	accept	1.0419274150531392e-06
194	140.00191351816795	1E-05	accept	1.0409858972854758e-06
195	140.00192351816796	1E-05	accept	1.0403404725531393e-06
196	140.00193351816796	1E-05	accept	1.0390513521504772e-06
197	140.00194351816796	1E-05	accept	1.0368721181584026e-06
198	140.00195351816797	1E-05	accept	1.0347615892843984e-06
199	140.00196351816797	1E-05	accept	1.0324517612665618e-06
200	140.00197351816797	1E-05	accept	1.028273365003845e-06
201	140.00198351816798	1E-05	accept	1.0236539142772317e-06
202	140.00199351816798	1E-05	accept	1.0180598680583908e-06
203	140.00200351816798	1E-05	accept	1.0128603981586694e-06
204	140.00201351816798	1E-05	accept	1.0055873339601566e-06
205	140.002023518168	1E-05	accept	9.987901828067417e-07
206	140.002033518168	1E-05	accept	9.902072613005676e-07
207	140.002043518168	1E-05	accept	9.81530873810209e-07
208	140.002053518168	1E-05	accept	9.724093398112764e-07
209	140.002063518168	1E-05	accept	9.627212351894181e-07
210	140.002073518168	1E-05	accept	9.525048241075379e-07
211	140.002083518168	1E-05	accept	9.425334055619384e-07
212	140.002093518168	1E-05	accept	9.309989267745531e-07
213	140.002103518168	1E-05	accept	9.200133053512824e-07
214	140.00211351816802	1E-05	accept	9.088527707971987e-07
215	140.00212351816802	1E-05	accept	8.972168536497187e-07
216	140.00213351816802	1E-05	accept	8.866307773057281e-07
217	140.00214351816803	1E-05	accept	8.748645517950907e-07
218	140.00215351816803	1E-05	accept	8.636334349642278e-07
219	140.00216351816803	1E-05	accept	8.53070178941103e-07
220	140.00217351816804	1E-05	accept	8.443362325823149e-07
221	140.00218351816804	1E-05	accept	8.326161575638181e-07
222	140.00219351816804	1E-05	accept	8.233172139921304e-07
223	140.00220351816805	1E-05	accept	8.14370926126776e-07
224	140.00221351816805	1E-05	accept	8.061686549207966e-07
225	140.00222351816805	1E-05	accept	7.987173419579453e-07
226	140.00223351816805	1E-05	accept	7.916744052063751e-07
227	140.00224351816806	1E-05	accept	7.860029553480782e-07
228	140.00225351816806	1E-05	accept	7.815670004344503e-07
229	140.00226351816806	1E-05	accept	7.752245867122883e-07
230	140.00227351816807	1E-05	accept	7.712568413238946e-07
231	140.00228351816807	1E-05	accept	7.673410068737171e-07
232	140.00229351816807	1E-05	accept	7.648004482175053e-07
233	140.00230351816808	1E-05	accept	7.625452417475812e-07
234	140.00231351816808	1E-05	accept	7.600542130405542e-07
235	140.00232351816808	1E-05	accept	7.586432574052785e-07
236	140.0023335181681	1E-05	accept	7.574738774648359e-07
237	140.0023435181681	1E-05	accept	7.566743150655698e-07
238	140.0023535181681	1E-05	accept	7.558693443479115e-07
239	140.0023635181681	1E-05	accept	7.549571363578885e-07
240	140.0023735181681	1E-05	accept	7.547451001371794e-07
241	140.0023835181681	1E-05	accept	7.549662657912529e-07
242	140.0023935181681	1E-05	accept	7.534418755484206e-07
243	140.0024035181681	1E-05	accept	7.52159730252963e-07
244	140.0024135181681	1E-05	accept	7.51909053973815e-07
245	140.00242351816811	1E-05	accept	7.498698172846313e-07
246	140.00243351816812	1E-05	accept	7.485760472208512e-07
247	140.00244351816812	1E-05	accept	7.46607592152585e-07
248	140.00245351816812	1E-05	accept	7.445748410753725e-07
249	140.00246351816813	1E-05	accept	7.432010166260984e-07
250	140.00247351816813	1E-05	accept	7.394042174384645e-07
251	140.00248351816813	1E-05	accept	7.354490179561419e-07
252	140.00249351816814	1E-05	accept	7.31826798872411e-07
253	140.00250351816814	1E-05	accept	7.281974720833057e-07
254	140.00251351816814	1E-05	accept	7.233925261733626e-07
255	140.00252351816815	1E-05	accept	7.182316336662848e-07
256	140.00253351816815	1E-05	accept	7.132791629935056e-07
257	140.00254351816815	1E-05	accept	7.083598792338333e-07
258	140.00255351816816	1E-05	accept	7.021687966881481e-07
259	140.00256351816816	1E-05	accept	6.961889728143629e-07
260	140.00257351816816	1E-05	accept	6.899901048578042e-07
261	140.00258351816817	1E-05	accept	6.835023147491382e-07
262	140.00259351816817	1E-05	accept	6.771920538311186e-07
263	140.00260351816817	1E-05	accept	6.709964235831224e-07
264	140.00261351816818	1E-05	accept	6.666768351197449e-07
265	140.00262351816818	1E-05	accept	6.579806955706362e-07
266	140.00263351816818	1E-05	accept	6.514219873093234e-07
267	140.00264351816818	1E-05	accept	6.455404947362124e-07
268	140.0026535181682	1E-05	accept	6.400252363380362e-07
269	140.0026635181682	1E-05	accept	6.333219299485284e-07
270	140.0026735181682	1E-05	accept	6.276571061978891e-07
271	140.0026835181682	1E-05	accept	6.22765028933763e-07
272	140.0026935181682	1E-05	accept	6.170192419857621e-07
273	140.0027035181682	1E-05	accept	6.127137953138411e-07
274	140.0027135181682	1E-05	accept	6.087043275685564e-07
275	140.0027235181682	1E-05	accept	6.048721354736774e-07
276	140.0027335181682	1E-05	accept	6.001940046919763e-07
277	140.00274351816822	1E-05	accept	5.962288443641481e-07
278	140.00275351816822	1E-05	accept	5.936822639759953e-07
279	140.00276351816822	1E-05	accept	5.906235036974063e-07
280	140.00277351816823	1E-05	accept	5.884877845103968e-07
281	140.00278351816823	1E-05	accept	5.854254343021327e-07
282	140.00279351816823	1E-05	accept	5.829771715608751e-07
283	140.00280351816824	1E-05	accept	5.811253273312297e-07
284	140.00281351816824	1E-05	accept	5.791947217332957e-07
285	140.00282351816824	1E-05	accept	5.775606127922815e-07
286	140.00283351816825	1E-05	accept	5.76057222813461e-07
287	140.00284351816825	1E-05	accept	5.747911212885433e-07
288	140.00285351816825	1E-05	accept	5.735758974441581e-07
289	140.00286351816825	1E-05	accept	5.717438217783827e-07
290	140.00287351816826	1E-05	accept	5.710074335746012e-07
291	140.00288351816826	1E-05	accept	5.6924127798664e-07
292	140.00289351816826	1E-05	accept	5.673364321478535e-07
293	140.00290351816827	1E-05	accept	5.656233762745398e-07
294	140.00291351816827	1E-05	accept	5.639877202801574e-07
295	140.00292351816827	1E-05	accept	5.62126476900949e-07
296	140.00293351816828	1E-05	accept	5.597584539884811e-07
297	140.00294351816828	1E-05	accept	5.583230352975559e-07
298	140.00295351816828	1E-05	accept	5.550534397524547e-07
299	140.0029635181683	1E-05	accept	5.527350083882799e-07
300	140.0029735181683	1E-05	accept	5.495749456323103e-07
301	140.0029835181683	1E-05	accept	5.466269331916856e-07
302	140.0029935181683	1E-05	accept	5.433414566977378e-07
303	140.0030035181683	1E-05	accept	5.398593251448297e-07
304	140.0030135181683	1E-05	accept	5.365393957849507e-07
305	140.0030235181683	1E-05	accept	5.325919254735655e-07
306	140.0030335181683	1E-05	accept	5.287357066649481e-07
307	140.0030435181683	1E-05	accept	5.252815021278899e-07
308	140.00305351816831	1E-05	accept	5.20758020484336e-07
309	140.00306351816832	1E-05	accept	5.166467565320225e-07
310	140.00307351816832	1E-05	accept	5.126459347404819e-07
311	140.00308351816832	1E-05	accept	5.082129131170687e-07
312	140.00309351816833	1E-05	accept	5.042292385927753e-07
313	140.00310351816833	1E-05	accept	5.000836144215944e-07
314	140.00311351816833	1E-05	accept	4.960410572822303e-07
315	140.00312351816834	1E-05	accept	4.919079738471437e-07
316	140.00313351816834	1E-05	accept	4.880532432791328e-07
317	140.00314351816834	1E-05	accept	4.839686321355759e-07
318	140.00315351816835	1E-05	accept	4.805479108776039e-07
319	140.00316351816835	1E-05	accept	4.76716717200148e-07
320	140.00317351816835	1E-05	accept	4.734798653019138e-07
321	140.00318351816836	1E-05	accept	4.705153820698271e-07
322	140.00319351816836	1E-05	accept	4.6880127241149027e-07
323	140.00320351816836	1E-05	accept	4.6510582831395993e-07
324	140.00321351816837	1E-05	accept	4.624401249594309e-07
325	140.00322351816837	1E-05	accept	4.598511368684913e-07
326	140.00323351816837	1E-05	accept	4.57411911088076e-07
327	140.00324351816838	1E-05	accept	4.554668786065822e-07
328	140.00325351816838	1E-05	accept	4.537543358358764e-07
329	140.00326351816838	1E-05	accept	4.5233246064713466e-07
330	140.00327351816838	1E-05	accept	4.508317886204587e-07
331	140.0032835181684	1E-05	accept	4.5023856224827146e-07
332	140.0032935181684	1E-05	accept	4.4867920748022104e-07
333	140.0033035181684	1E-05	accept	4.482984237631179e-07
334	140.0033135181684	1E-05	accept	4.4670168481886975e-07
335	140.0033235181684	1E-05	accept	4.4626464301944886e-07
336	140.0033335181684	1E-05	accept	4.4612776988158204e-07
337	140.0033435181684	1E-05	accept	4.4529277981103305e-07
338	140.0033535181684	1E-05	accept	4.4376529509014896e-07
339	140.0033635181684	1E-05	accept	4.436868460282473e-07
340	140.00337351816842	1E-05	accept	4.429533728432459e-07
341	140.00338351816842	1E-05	accept	4.4170386876460896e-07
342	140.00339351816842	1E-05	accept	4.4115868909392475e-07
343	140.00340351816843	1E-05	accept	4.4034983178214336e-07
344	140.00341351816843	1E-05	accept	4.3913505502745166e-07
345	140.00342351816843	1E-05	accept	4.3891242122180103e-07
346	140.00343351816844	1E-05	accept	4.373970801562795e-07
347	140.00344351816844	1E-05	accept	4.363320829015837e-07
348	140.00345351816844	1E-05	accept	4.35764666887698e-07
349	140.00346351816845	1E-05	accept	4.329868418065056e-07
350	140.00347351816845	1E-05	accept	4.3181042147503693e-07
351	140.00348351816845	1E-05	accept	4.296343463600466e-07
352	140.00349351816845	1E-05	accept	4.279440605074927e-07
353	140.00350351816846	1E-05	accept	4.2544433449053613e-07
354	140.00351351816846	1E-05	accept	4.2331910602041026e-07
355	140.00352351816846	1E-05	accept	4.212144972615008e-07
356	140.00353351816847	1E-05	accept	4.1862439436226565e-07
357	140.00354351816847	1E-05	accept	4.160286530904946e-07
358	140.00355351816847	1E-05	accept	4.133241006328082e-07
359	140.00356351816848	1E-05	accept	4.106351274540373e-07
360	140.00357351816848	1E-05	accept	4.0752820896690463e-07
361	140.00358351816848	1E-05	accept	4.0516859610714036e-07
362	140.0035935181685	1E-05	accept	4.017550822677258e-07
363	140.0036035181685	1E-05	accept	3.987735948928499e-07
364	140.0036135181685	1E-05	accept	3.95449518831606e-07
365	140.0036235181685	1E-05	accept	3.9262976071017473e-07
366	140.0036335181685	1E-05	accept	3.893467220146664e-07
367	140.0036435181685	1E-05	accept	3.862899329046607e-07
368	140.0036535181685	1E-05	accept	3.829550201293569e-07
369	140.0036635181685	1E-05	accept	3.814453986237525e-07
370	140.0036735181685	1E-05	accept	3.7762326908168385e-07
371	140.00368351816851	1E-05	accept	3.747977326407551e-07
372	140.00369351816852	1E-05	accept	3.724112219464808e-07
373	140.00370351816852	1E-05	accept	3.6939444637203713e-07
374	140.00371351816852	1E-05	accept	3.6668293774832466e-07
375	140.00372351816853	1E-05	accept	3.6424331518715753e-07
376	140.00373351816853	1E-05	accept	3.61722228606795e-07
377	140.00374351816853	1E-05	accept	3.597169890282886e-07
378	140.00375351816854	1E-05	accept	3.576409144338039e-07
379	140.00376351816854	1E-05	accept	3.558407915201789e-07
380	140.00377351816854	1E-05	accept	3.5395983712198923e-07
381	140.00378351816855	1E-05	accept	3.5265946957923376e-07
382	140.00379351816855	1E-05	accept	3.5231679492942473e-07
383	140.00380351816855	1E-05	accept	3.5034402276051274e-07
384	140.00381351816856	1E-05	accept	3.495236949474289e-07
385	140.00382351816856	1E-05	accept	3.4742306218257463e-07
386	140.00383351816856	1E-05	accept	3.470094828229475e-07
387	140.00384351816857	1E-05	accept	3.4551679116069093e-07
388	140.00385351816857	1E-05	accept	3.4542484787326565e-07
389	140.00386351816857	1E-05	accept	3.441669389601491e-07
390	140.00387351816858	1E-05	accept	3.436649662573171e-07
391	140.00388351816858	1E-05	accept	3.4337658061451206e-07
392	140.00389351816858	1E-05	accept	3.437006131984012e-07
393	140.00390351816858	1E-05	accept	3.5747628507250233e-07
394	140.0039135181686	1E-05	accept	3.416145545179916e-07
395	140.0039235181686	1E-05	accept	3.4135144516928247e-07
396	140.0039335181686	1E-05	accept	3.424371215830613e-07
397	140.0039435181686	1E-05	accept	3.40253280133287e-07
398	140.0039535181686	1E-05	accept	3.3998664096625914e-07
399	140.0039635181686	1E-05	accept	3.4025729744437635e-07
400	140.0039735181686	1E-05	accept	3.4203708898418043e-07
401	140.0039835181686	1E-05	accept	3.38351267756606e-07
402	140.0039935181686	1E-05	accept	3.3743621289736646e-07
403	140.00400351816862	1E-05	accept	3.3651970674867245e-07
404	140.00401351816862	1E-05	accept	3.363825004502198e-07
405	140.00402351816862	1E-05	accept	3.355495541011747e-07
406	140.00403351816863	1E-05	accept	3.339364690423231e-07
407	140.00404351816863	1E-05	accept	3.32822206584491e-07

Reducing the Reynolds number can help sometimes, but as you don’t seem to be using adaptive time-stepping, this is not the source of the poor runtime.

I would try the mpi microbenchmarks across the same nodes as you are seeing poor performance with.