Problem with setting up OpenMP when installing PyFR on Windows10

Dear all,

Encouraged by nnunn, I create a new topic about the errors I encountered during the compilation of PyFR.1.12.2 with OpenMP backend. As a paralleled process to Problem with setting up CUDA when installing PyFR on Windows10, there are many details for a newbie to start his/her trip to install and run successfully with the code. So it may help others when encountering a similar problem, also it can be the extension to How I installed PyFR 1.6.0 with CUDA backend (and dependancies) on Windows 10.

First, I need to clarify that I have installed Visual Studio 2019, anaconda and pycharm on my PC. The VS is mainly used to install the dynamic lib, ended with .DLL. Anaconda and pycharm are quite useful for building and managing a python environment. After building a new python environment, it’s time to install PyFR following the instruction.

Then, according to different backend, I installed the corresponding dependent libs. For OpenMP backend, two libs are necessary, GCC and libxsmm. GCC can be available after installing MinGw, while libxsmm is a little confusing. All I knew is that the libxsmm can be build with VS or MinGw, but the detail of compilation is little. Actually, the error still exist like following:

Traceback (most recent call last):
  File "e:\pyfr_test\pyfr\", line 33, in __call__
    res = cache[key]
KeyError: (<function OpenMPKernelProvider._build_kernel at 0x0000020D34FE7EE0>, b'\x80\x04\x955\x03\x00\x00\x00\x00\x00\x00\x8c\nbatch_gemm\x94X\xf3\x02\x00\x00\n\n#include <omp.h>\n#include <stdlib.h
>\n#include <tgmath.h>\n\n#define SOA_SZ 8\n#define BLK_SZ 8\n\n#define min(a, b) ((a) < (b) ? (a) : (b))\n#define max(a, b) ((a) > (b) ? (a) : (b))\n\n// Typedefs\ntypedef double fpdtype_t;\n\n\n\n//
 libxsmm prototype\ntypedef void (*libxsmm_xfsspmdm_execute)(void *, const fpdtype_t *,\n                                         fpdtype_t *);\n\n// gimmik prototype\ntypedef void (*gimmik_execute)(i
nt, const fpdtype_t *, int, fpdtype_t *, int);\n\nvoid\nbatch_gemm(gimmik_execute exec, int bldim,\n           int nblocks,\n           const fpdtype_t *b, int bblocksz, fpdtype_t *c, int cblocksz)\n{
\n    #pragma omp parallel for\n    for (int ib = 0; ib < nblocks; ib++)\n        exec(bldim, b + ib*bblocksz, bldim, c + ib*cblocksz, bldim);\n}\n\n\x94]\x94(\x8c\x05numpy\x94\x8c\x05int64\x94\x93\x9
4h\x03\x8c\x05int32\x94\x93\x94h\x07h\x05h\x07h\x05h\x07e\x87\x94.', b'\x80\x04}\x94.')

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "E:\Anaconda\envs\pyfr_tf\Scripts\", line 33, in <module>
    sys.exit(load_entry_point('pyfr', 'console_scripts', 'pyfr')())
  File "e:\pyfr_test\pyfr\", line 117, in main
  File "e:\pyfr_test\pyfr\", line 250, in process_run
  File "e:\pyfr_test\pyfr\", line 232, in _process_common
    solver = get_solver(backend, rallocs, mesh, soln, cfg)
  File "e:\pyfr_test\pyfr\solvers\", line 16, in get_solver
    return get_integrator(backend, systemcls, rallocs, mesh, initsoln, cfg)
  File "e:\pyfr_test\pyfr\integrators\", line 36, in get_integrator
    return integrator(backend, systemcls, rallocs, mesh, initsoln, cfg)
  File "e:\pyfr_test\pyfr\integrators\std\", line 13, in __init__
    super().__init__(*args, **kwargs)
  File "e:\pyfr_test\pyfr\integrators\std\", line 27, in __init__
    self.system = systemcls(backend, rallocs, mesh, initsoln,
  File "e:\pyfr_test\pyfr\solvers\base\", line 68, in __init__
    self._gen_kernels(eles, int_inters, mpi_inters, bc_inters)
  File "e:\pyfr_test\pyfr\solvers\base\", line 187, in _gen_kernels
    kernels[pn, kn].append(kgetter())
  File "e:\pyfr_test\pyfr\solvers\baseadvec\", line 45, in <lambda>
    kernels['disu'] = lambda: self._be.kernel(
  File "e:\pyfr_test\pyfr\backends\base\", line 163, in kernel
    return kern(*args, **kwargs)
  File "e:\pyfr_test\pyfr\backends\openmp\", line 48, in mul
    batch_gemm = self._build_kernel('batch_gemm', src, argt)
  File "e:\pyfr_test\pyfr\", line 35, in __call__
    res = cache[key] = self.func(*args, **kwargs)
  File "e:\pyfr_test\pyfr\backends\openmp\", line 13, in _build_kernel
    mod = SourceModule(src, self.backend.cfg)
  File "e:\pyfr_test\pyfr\backends\openmp\", line 65, in __init__
    self.mod = self._cache_set_and_loadlib(lpath)
  File "e:\pyfr_test\pyfr\backends\openmp\", line 130, in _cache_set_and_loadlib
    return CDLL(clpath)
  File "E:\Anaconda\envs\pyfr_tf\lib\ctypes\", line 373, in __init__
    self._handle = _dlopen(self._name, mode)
FileNotFoundError: Could not find module 'C:\Users\Thatcher\AppData\Local\pyfr\pyfr\Cache\c8188542ba03ceef8f44731c24cb91566c3cecd078454acb4da2577a34372d9f.dll' (or one of its dependencies). Try using the
full path with constructor syntax.

Hope someone can offer any hints or point out the missing part. Thanks!

Regards, Thatcher

1 Like

Hi Thatcher,

Please note, the steps below refer to building dll kernels for PyFR’s OpenMP backend when using the native Microsoft tools in Visual Studio 2019. When using a Cygwin or MinGW setup, none of this may be necessary.

Regarding the “batch-gemm” wrapper kernel mentioned in your error trace, have a look at the batch-gemm.mako template,


When using VC to build a Windows dll, we need to export the batch-gemm routine,

gcc line 11: void
VC  line 11: __declspec(dllexport) void

Next, we need to adjust the cc_cmd(… ) in file:


From line 71:

def cc_cmd(self, srcname, libname):
    if sys.platform == 'win32':
        cmd = [
  ,        # Compiler name
            '/Ox',          # TODO: add various flags
            '/openmp',      # Enable OpenMP support
            srcname,        # source file name
            '/LD'           # create tmp.dll from tmp.c
        cmd = [
  ,        # Compiler name
            ...,            # etc.

Also, some of the mako kernels use permutations of the “restrict” keyword.
For the VC compiler, I had to change these to “__restrict”.

Next problem is the way VC handles the #pragma omp blocks.

When treating the source code as C++, all is well.

But when treating the source code as C, the compiler rejects the following definition of the loop control variable, ib:

#pragma omp parallel for 
for (int ib = 0; ib < nblocks; ib++)
{ ... }

This is fixed by shifting the definition of ib outside the omp block:

int ib=0;
#pragma omp parallel for 
for (ib = 0; ib < nblocks; ib++)
{ ... }

But wait, there’s more. If we use the C++ compiler, we need to further adjust exported names, e.g.

gcc     line 11: void
VC(C)   line 11: __declspec(dllexport) void
VC(C++) line 11: extern "C" __declspec(dllexport) void  

And finally, to enable the omp simd directive, we need to add “:experimental

cmd = [,
    '/openmp:experimental', # Enable omp simd

See SIMD Extension | Microsoft Docs

But wait, yep, there’s more. We need to get the VC tools on the system path… :pleading_face:

As mentioned, if using a Cygwin or MinGW setup, none of this may be necessary.
But you may still need to adjust the exported function prototypes, e.g.:

// void
__declspec(dllexport) void

Hope this helps you make a start,