Max nnz for GiMMiK kernels?

nnunn · 9 August 2018 15:37

When using hex elements with order say 4 or 5, the number of non-zeros in the GiMMiK kernels gets quite high.

E.g., with n=4, tgradpcoru_upts [hex] appears to get 1819 non-zeros from 46875 entries (i.e. about 4% non-zeros)

Just wondering, for high-order 3d runs, what’s an appropriate way to replace the GiMMiK kernels with more normal matrix multiplication?

PS: even with 1819 non-zeros, the hard-wired (const) GiMMiK kernels appear to run fine, but at some point I guess the number of registers required must outweigh the cost of loading the const mats from memory?

From class CUDAGiMMiKKernels(CUDAKernelProvider):
# Check that A is reasonably sparse
if np.count_nonzero(a.get()) > self.max_nnz:
raise NotSuitableError('Matrix too dense for GiMMiK')

default self.max_nnz: [512]

thanks for any pointers,
Nigel

fdw · 14 August 2018 13:58

Hi,

When using hex elements with order say 4 or 5, the number of non-zeros in the GiMMiK kernels gets quite high.

E.g., with n=4, tgradpcoru_upts [hex] appears to get 1819 non-zeros from 46875 entries (i.e. about 4% non-zeros)

Just wondering, for high-order 3d runs, what's an appropriate way to replace the GiMMiK kernels with more normal matrix multiplication?

PS: even with 1819 non-zeros, the hard-wired (const) GiMMiK kernels appear to run fine, but at some point I guess the number of registers required must outweigh the cost of loading the const mats from memory?

From class CUDAGiMMiKKernels(CUDAKernelProvider):
     \# Check that A is reasonably sparse
     if np\.count\_nonzero\(a\.get\(\)\) &gt; self\.max\_nnz:
         raise NotSuitableError\(&#39;Matrix too dense for GiMMiK&#39;\)
default self.max_nnz: [512]

If you wish to disable GiMMiK you can place the key

gimmik-max-nnz = 0

into the [backend-<your backed>] section of the config file. This will cause GiMMiK to raise the NotSuitableError you showed above, and thus result in the multiplication being handled by dense BLAS.

The point at which GiMMiK becomes unprofitable depends heavily on the form of the matrices, which backend you are using, and the hardware you are running on. In retrospect, 512 is probably a tad on the low side -- at least for matrices which are sparse. Although, saying that, for matrices which are dense 512 is sometimes on the high side.

Regards, Freddie.

nnunn · 18 August 2018 20:26

Hi Freddie - thanks for the tip!

I checked the const op mats for hex with order=5:

Some are (N,N) = (216,216), with 6 non-zeros per row (nnz=1296, 3%)
One is (3N,N) = (648, 216), with 6 non-zeros per row (nnz=3888, 3%)
One is (N,3N) = (216,648), with 18 non-zeros per row (nnz=3888, 3%)
One is (N,N) = (216,216), with 2 non-zeros per row (nnz=432, 1%)

Backend is CUDA. As an exercise, I’ll try a kernel with each thread accumulating one sparse dot product (6 or 18 madds).

thanks - Nigel

Topic		Replies	Views
Question about Gimmik flops and Arithmetic intensity Development gimmik	1	230	12 August 2022
Question on the functionlaity of GiMMiK General	11	351	22 March 2022
Cuda backend parameters General	5	690	23 May 2021
Problem with GiMMiK kernel Errors cuda	5	543	22 July 2021
What's the relationship between Gimmik and cublas? General	10	556	9 May 2022

Max nnz for GiMMiK kernels?

Related topics