Hi Tom,
Thank you for your reply.
I have installed libxsmm according to commands as described below.
make STATIC=0 BLAS=0
make PREFIX=<libxsmm install path> STATIC=0 BLAS=0 installThe make process finished without error. Then, I set parameter in
"backend-openmp" section in ini file.[backend-openmp]
cc = icc
cblas = <libxsmm install path>/lib/libxsmm.so
So libxsmm is not a cblas library (that is MKL). All you need to do is
ensure that libxsmm is somewhere where PyFR can find it. If need be you
can point PyFR to the library by doing
export PYFR_XSMM_LIBRARY_PATH=/path/here/to/lib.so
Are the setting parameters correct? Xeon Phi 7250 have 68 physical
cores, and each core can run 4 threads. Therefore, I set
$OMP_NUM_THREADS=272.
For best performance you want one thread per core.
Could you tell me meanings of "gimmik-max-nnz", "libxsmm-block-sz" and
"libxsmm-max-sz" in "backend-openmp" section in ini file?
The first parameter decides the cut-off point at which we will no longer
use the GiMMiK matrix multiplication library. The second parameter
controls the block size for libxsmm, although performance does not
appear to be too sensitive to its value. The final parameter decides
the cut-off point at which we will no longer use libxsmm for matrix
multiplications. The hierarchy is:
1. libxsmm (if available and size < libxsmm-max-sz)
2. GiMMiK (else if nonzeros < gimmik-max-nnz)
3. cblas (else)
The defaults in PyFR are not too bad, but you may be able to get 20-30%
out for certain test cases by playing around with these parameters. The
key is to ensure that libxsmm is available, this is especially important
at lower polynomial orders.
Regards, Freddie.