Hello everyone.
I am a newcomer to pyFR, but I want to grasp it quickly.
I’m attempting to run a three-dimensional case based on the paper “On the utility of GPU accelerated high-order methods forunsteady flow simulations: A comparison withindustry-standard tools”.
However, I’ve encountered some difficulties during the learning process, particularly regarding how to obtain computational results faster with pyFR.
My personal computer is equipped with an i7-12700 CPU with 12 cores and 20 threads, as well as an RTX 3090 GPU. The installation of pyFR is already configured, and CUDA and mpiexec are available.
However, my understanding of mpiexec is limited, and I’m unsure how the following statement coordinates the simultaneous utilization of the CPU and GPU:
pyfr import sd7003.msh sd7003.pyfrm
pyfr partition 20 sd7003.pyfrm .
mpiexec -n 20 pyfr run -b cuda -p sd7003.pyfrm sd7003.ini
Specifically, I’m unsure about the magnitude of the speed difference between the execution of the code above and the code below:
pyfr import sd7003.msh sd7003.pyfrm
pyfr run -b cuda -p sd7003.pyfrm sd7003.ini
Based on my limited experimentation, it appears that there isn’t a significant difference in speed between the two codes—they are both slow. As a newcomer without much experience, in such a situation, would it be best to use mpiexec to partition the maximum supported number of processes for the CPU and combine it with CUDA for computation? I’m unsure if this approach is the most efficient, and I’m curious about the advantages it offers over using CUDA alone in terms of principles.
I would greatly appreciate it if you could help clarify my doubts.
Best regards.