Which BLAS libraries are used in PyFR?

Dear All,

My name is Antonio Garcia-Uceda, and I’m working on Flux Reconstruction.

Could you please let me know which BLAS-type library do you use with PyFR? And according to your experience, Which one offers the best performance?

In my experience, OpenBLAS library is more efficient in my machine LINUX _86 (64 bits), followed very closely by intel MKL. However I’m testing them in different architectures.

I wanted to test ATLAS-BLAS library from http://math-atlas.sourceforge.net/. However I get compilation issues: the library seems to compiles fine but it fails in the linkage process (undefined reference to cblas_dgemm…).

Could you please tell me whether you’ve also tested the ATLAS-BLAS library, whether from the same source, and whether you came across similar issues?

Thanks a lot in advance.

Best regards,
Antonio

Hi Antonio,

In our experience single-threaded ATLAS tends to outperform both
OpenBLAS and Intel MKL. It can, however, be somewhat tricky to build
from source; although doing so is essential for good performance. The
ATLAS readme files should be helpful here.

Regards, Freddie.

[2021 edit (WillT): this is likely no longer the case and for CPU work gimmik, mkl, and libxsmm are the key to good performance]

Thanks a lot Freddie,

I should keep trying with ATLAS then. Did you download yours from the same source?

Best regards,
Antonio

Hi Antonio,

I grabbed the source code off of Sourceforge and then followed the build
instructions to generate a shared library.

Regards, Freddie.

Dear Freddie,

I managed to build and link the ATLAS library. I’ve run some tests but it gives me lower performance than I expected: OpenBLAS (single-threaded) outperforms ATLAS by 20-30%.

I know that the peformance of BLAS is platform-dependent and that your good experience with ATLAS is not directly extrapolated to my case. However I’d still like to give it a chance. As you very well indicated, a good performance on the BLAS library used is essential.

Would it be possible to have a reference of the times per iteration using PyFR and different BLAS libraries, single proc, for a given case you run in the past? Perhaps you have this information reported at some moment in the past?

On the other hand, Would it be possible for you to provide with the compiled ATLAS library *.so? I could test it myself, of course as long as we’re using similar LINUX OS.

Many thanks in advance.

Best regards,
Antonio

Hi Antonio,

I managed to build and link the ATLAS library. I've run some tests but
it gives me lower performance than I expected: OpenBLAS
(single-threaded) outperforms ATLAS by 20-30%.

I know that the peformance of BLAS is platform-dependent and that your
good experience with ATLAS is not directly extrapolated to my case.
However I'd still like to give it a chance. As you very well indicated,
a good performance on the BLAS library used is essential.

Would it be possible to have a reference of the times per iteration
using PyFR and different BLAS libraries, single proc, for a given case
you run in the past? Perhaps you have this information reported at some
moment in the past?

On the other hand, Would it be possible for you to provide with the
compiled ATLAS library *.so? I could test it myself, of course as long
as we're using similar LINUX OS.

Many thanks in advance.

I maintain benchmarks of various BLAS libraries for the sorts of
matrices which occur in PyFR on my website:

https://freddie.witherden.org/pages/blas-gemm-bench/

ATLAS, by design, is not performance portable at the binary level. This
is due to its use of autotuning to select/generate the optimal set of
BLAS kernels for a given platforms. As such it really needs to be
recompiled from source on every distinct system you wish to deploy it on.

Regards, Freddie.

Dear Freddie,

Thanks a lot for the link. That’s exactly what I was looking for, a reference to test the performance of my BLAS libraries.

Although of course the FLOPS values you reported and also the peak FLOPS of your machine will be different to those of mine, I would expect the ratio with respect to the peak FLOPS will be similar, am I right?

Best regards,
Antonio