What's the relationship between Gimmik and cublas?

luli · 9 February 2022 06:07

I read the paper Redirecting and, it seems that GIMMIk is better than cublas? Does the new version of pyfr still need to use the library cublas or not?

WillT · 9 February 2022 08:56

Gimmik is great in some circumstances, and with some of our ongoing work the set of circumstances that is works well for is always increasing. In the gimmik mul definition in pyfr there are some checks that are made, if the matrix doesn’t pass those checks a NotSuitableError is thrown. This is picked up by PyFR here, and then the next provider is tried.

What this means is PyFR will normally first try to use GiMMiK and then try cuBLAS as a fallback. @fdw may have a view on this, but I don’t see a situation where we completely remove cuBLAS as I think it is unlikely that GiMMiK will be as good as cuBLAS for large truly dense matrices, i.e those that occur for tets with order > ~5.

luli · 11 February 2022 03:25

Thank you! How can I choose to use cuBLAS only? I want to compare the GiMMiK and cuBLAS performance with my specific case.

WillT · 11 February 2022 07:08

I think the option to do that from the ini file was removed a couple of versions ago. But if you force mul to throw NotSuitableError then you’ll fall back on cuBLAS. Throw the error as soon as you get into the function, somewhere around here: PyFR/gimmik.py at c8c053d5c0e34ac7a5639c4d328ebacde10fe689 · PyFR/PyFR · GitHub

To force PyFR to use GiMMiK you can comment out the line a bit later that throws that error.

luli · 15 February 2022 06:43

OK, thank you! I also wonder the performance in different hardware like between A100 and A10. When testing the examples provided, it shows that the GPU-util is very low

. Is there anything wrong ?

WillT · 15 February 2022 07:28

That GPU utility number isn’t very informative about how a task is performing.

If you want to understand if a program is performing better I suggest using NVIDIA night compute (ncu), you might have to ask your system administrator for some additional permissions in order to use it.

With ncu you’ll be able to see the bandwidth and flop numbers for each kernel. This should give you a better idea of how it is performing. In my experience however, the performance with PyFR and GiMMiK on A100s is about where I would expect it. Although we are working on ways to make more use of some of the new features of the A100, such as async mem copies, that bypass registers when loading into shared.

fdw · 15 February 2022 12:47

As noted in the Examples section of the documentation

PyFR includes several test cases to showcase the functionality of the solver. It is important to note, however, that these examples are all relatively small 2D simulations and, as such, are not suitable for scalability or performance studies.

Regards, Freddie.

luli · 16 February 2022 03:01

Thank you! Could you please provide some relatively large examples simulating with new verision PyFR?

WillT · 16 February 2022 07:37

Many of the papers by the PyFR team will include meshes in the supplementary material, which is a good place to look.

If you are trying to measure performance, this topic might be a good place to start: TGV Performance Numbers

I think I link to a reasonably sized cube mesh that you can try.

luli · 9 May 2022 07:08

Is it cublas or cutlass that pyfr uses when calling the matrix multiplication math library?

WillT · 9 May 2022 07:48

cublas, see here: PyFR/cublas.py at d175beccd3fc5587903cc00c0f401041ff22abe4 · PyFR/PyFR · GitHub

Topic		Replies	Views
Question on the functionlaity of GiMMiK General	11	362	22 March 2022
Questions on the TGV case General	35	993	22 March 2022
Mathlib in pyfr question Development cuda	1	190	20 April 2023
Openmp: "'mul' has no providers" General	2	151	6 May 2017
Which BLAS libraries are used in PyFR? General	6	156	4 November 2015

What's the relationship between Gimmik and cublas?

Related topics