Very thankful for the many insights given.
Is it possible to track the time spent out of pure C/CUDA computing? Just to guess how much in terms of performance can be squeezed out of the code
Regards
Very thankful for the many insights given.
Is it possible to track the time spent out of pure C/CUDA computing? Just to guess how much in terms of performance can be squeezed out of the code
Regards
Subtract out the total amount of time spent in plugins and you have quite a good estimate.
Regards, Freddie.
Ok but in this way you do not account for the time spent back and forth copying data from ctypes to python, is this time negligible even with very large simulations?
Regards
No time is ever spent doing this.
Copies of data (which can include device-to-host transfers on GPU backends) are only ever made by plugins, which are already accounted for.
Regards, Freddie.
Thanks a lot for the explanations!
Regards