Testing PyFR - 3d cylinder - openmp extremely slow

Hello developers,

I am on a task of testing several CFD codes with the goal of finding one that suits the best of our needs. I am seeing a lot of potential in PyFR and would like to run a couple of cases on it. The 2d cases that came with the program are nice. Strange thing is that two or more partitions perform slower than one partition. Then I was thinking probably a 2d case might not be big enough, as someone has stated they were designed to run on single core in another post.

So I created this 3d cylinder case (ini and msh file can be found in the attachment). I am trying to run it on the openmp backend at the moment. It is extremely slow. I read that Dr.Vincent said a 3d case setup file was attached to a publication. I could not find it yet. I hope can take a look at it.
aasdf.PNG

ffdsa.PNG

Also I tried to run it on GPU (Tesla k80) on a virtual machine, it would always take one core and shut down the rest at very beginning. I can’t provide the error message now. I feel like I need to update the openmpi to 4.0 (current 1.10).

Thanks,

Junting Chen

3d_cylinder.zip (5.1 MB)

Hi Junting,

Thanks for your interest in PyFR.

I read that Dr.Vincent said a 3d case setup file was attached to a publication. I could not find it yet.

You can find some 3D cases here, for example:

https://www.sciencedirect.com/science/article/pii/S0021999116307136

Regards

Peter

Hello Dr.Vincent,

Thanks for the quick reply. I did try to run the case with both openmp and cuda backends.

With CUDA backend (Tesla K80) on a virtual machine, I am constantly seeing this error message:

1.png

I found out a way to get rid of the message, but there was no improvement on the performance. (foot note: nvidia-smi tells me 12 cores are used in both cases)

2.png

Also, I realized that run speed remains the same regardless how many cores I call (mpirun -n ##). Single core, 12 cores and 20 cores all requiring 14hrs to run 20s of this 3d cylinder case with 360k cell at 4th order. I believe there is still some technical issues in here. I am suspecting the openmpi version is holding me back at the moment. Here it is openmpi 1.10 and latest version is 4.0.

As I am running this case with openmp backend on a local machine, I have encountered similar issue: regardless how many processors I am calling, there was no improvement on performance. The run speed is extremely slow. It is openmpi 2.1 on this local machine.

I am currently rebuilding the latest openmpi on the local machine (there is some difficulties on updating cuda aware openmpi on the virtual machine). Besides the openmpi issue, anything else might go wrong?
Thanks again.

Junting Chen

Hi Junting,

PyFR is a modern code. As such when running with the OpenMP backend you
only want to use one MPI rank per socket. If you are running on a 12
core CPU there should be one MPI rank (the remaining cores will be
saturated by OpenMPI threads). Using more ranks is likely to degrade
performance -- especially if you do not tell OpenMP to use less threads
(as you may end up with 12*12 = 144 threads which is a substantial
amount of over-subscription).

In the case of the CUDA backend all computation is offloaed to the GPU.
Thus there should be one MPI rank per CPU. Here, it is expected that
all but one of the cores will be idle. Running more ranks is just going
to over-subscribe the GPU and degrade performance.

Regards, Freddie.

Thanks Dr.Witherden, great to hear from you and sorry about the delay. There is still some issue regarding convergence and hope that you can give me some hints.

So first of all I tried to run the simulation Dr.Vincent provided along with the paper - flow past a 3d cylinder. I added a “tend” value and didn’t do any other change. It runs ok, take a little too long to see some results. Then I increased the inlet velocity from 0.2 to 3.2, and it no longer converges. All I got is 10E38. Is it because of the increase of Reynolds number from ~3k to ~30k?

Also, I ran a CAARC building case, which is one of our test case. In this case, I am only running 2nd order for now. I have tried both ac-navier-stokes (ac-zeta = 2.5 and 180) and navier-stokes solver. It keeps breaking at certain point. I have attached the msh and ini files if you have time to check. The result before simulation breaks looks like:

2.PNG

The initial condition of the domain was u = 0, and inlet flow velocity is u = 20m/s. The result shown on the left was taken at 173s. The CAARC building is placed at 300m behind the inlet, which means the incoming flow should have passed entire domain multiple times already. As I am checking results from previous time steps, the flow did move forward from the inlet, but not by much. In addition, the simulation breaks as some of the bad nodes diverged very soon after this time step.

In this run, I used navier-stokes.

3.PNG

I did another run with ac-navier-stokes solver, and setting u = 20 in the entire domain. It breaks much early on at around t = 38s. Also, I am expecting to see some turbulence flow above and behind the building.

I am suspecting the boundary condition setup was not done properly. Would you mind to elaborate on the difference between ac-char-riem-inv and ac-in-fv? Is ac-char-riem-inv / char-riem-inv the best option for the outlet?

As I read news on your tweet, Niki has developed an incompressible solver on PyFR. Any chance it will be open sourced?

Best regards,

Junting Chen

CAARC_ac.ini (1.47 KB)

CAARC.msh.zip (5.06 MB)

Hi Junting,

So first of all I tried to run the simulation Dr.Vincent provided along
with the paper - flow past a 3d cylinder. I added a "tend" value and
didn't do any other change. It runs ok, take a little too long to see
some results. Then I increased the inlet velocity from 0.2 to 3.2, and
it no longer converges. All I got is 10E38. Is it because of the
increase of Reynolds number from ~3k to ~30k?

So increasing the Reynolds number will almost certainly require either
an increase in polynomial order or mesh resolution. However, a bigger
problem is that your inlet velocity is now likely supersonic which will
dramatically change the dynamics of the simulation. Specifically, at a
minimum, you will need to enable the shock capturing functionality in
PyFR in order to stabilize the simulation.

As I read news on your tweet, Niki has developed an incompressible
solver on PyFR. Any chance it will be open sourced?

This solver is the ac- prefixed solver (with AC standing for artificial
compressibility which is the technology which underpins the
incompressible solver). As such not only is it open source but it is
also released and available in current versions of PyFR.

Regards, Freddie.

Hi Junting,

Thank you for the clarification.

Does the dimension of the computation domain need to be non-dimensionalized somehow?

Junting

Hi Junting

the dimensions of the computational domain should be such that you obtain the desired Reynolds number, given your choice of reference density, velocity, and viscosity. It may be easier though to change these variables to get your target Reynolds number, while keeping fixed your domain (i.e. while keeping fixed your reference length).

Best
Giorgio