Hello , everyone !
I run tgv case TGV Performance Numbers - General - PyFR , and profile with nsys, I found there is DtoD operations . I only use one chip , why there is a copy operation ? Where does it appear in the source code?
Hello , everyone !
Depending on what solver you are using and your config, there are some points where an array has to be copied. For example, here: https://github.com/PyFR/PyFR/blob/98929401cd2d607da4058f13ecac3f4a6aeee60b/pyfr/solvers/baseadvec/elements.py#L102
This can be achieved with a DtoD copy where both devices are the same, but where src and dst point to different memory. Using a DtoD, rather than trying to make your own kernel, is probably much faster and means we can leave hardware specific optimisations to the hardware vendor.