ExecError: error invoking 'gcc...' when on cluster

Hello, I use the pyfr in cluster, the case with 350000 elements. When I simulated the case with third order, it works. But when I use second order, the errors comes out. some parts of the errors are attached. How could I solve it? Thanks a lot.

 File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pytools/prefork.py", line 177, in call_capture_output
    return self._remote_invoke("call_capture_output", cmdline, cwd,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pytools/prefork.py", line 159, in _remote_invoke
    raise result
pytools.prefork.ExecError: error invoking 'gcc -shared -std=c11 -Ofast -march=native -fopenmp -fPIC -o libtmp.so tmp.c -lm': status 1 invoking 'gcc -shared -std=c11 -Ofast -march=native -fopenmp -fPIC -o libtmp.so tmp.c -lm': /data/software/gcc/12.1.0/libexec/gcc/x86_64-pc-linux-gnu/12.1.0/cc1: error while loading shared libraries: libmpfr.so.6: cannot open shared object file: No such file or directory

onzero)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pytools/prefork.py", line 177, in call_capture_output
    return self._remote_invoke("call_capture_output", cmdline, cwd,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pytools/prefork.py", line 159, in _remote_invoke
    raise result
pytools.prefork.ExecError: error invoking 'gcc -shared -std=c11 -Ofast -march=native -fopenmp -fPIC -o libtmp.so tmp.c -lm': status 1 invoking 'gcc -shared -std=c11 -Ofast -march=native -fopenmp -fPIC -o libtmp.so tmp.c -lm': /data/software/gcc/12.1.0/libexec/gcc/x86_64-pc-linux-gnu/12.1.0/cc1: error while loading shared libraries: libmpfr.so.6: cannot open shared object file: No such file or directory

onzero)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pytools/prefork.py", line 177, in call_capture_output
    return self._remote_invoke("call_capture_output", cmdline, cwd,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pytools/prefork.py", line 159, in _remote_invoke
    raise result
pytools.prefork.ExecError: error invoking 'gcc -shared -std=c11 -Ofast -march=native -fopenmp -fPIC -o libtmp.so tmp.c -lm': status 1 invoking 'gcc -shared -std=c11 -Ofast -march=native -fopenmp -fPIC -o libtmp.so tmp.c -lm': /data/software/gcc/12.1.0/libexec/gcc/x86_64-pc-linux-gnu/12.1.0/cc1: error while loading shared libraries: libmpfr.so.6: cannot open shared object file: No such file or directory

onzero)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pytools/prefork.py", line 177, in call_capture_output
    return self._remote_invoke("call_capture_output", cmdline, cwd,
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pytools/prefork.py", line 159, in _remote_invoke
    raise result
pytools.prefork.ExecError: error invoking 'gcc -shared -std=c11 -Ofast -march=native -fopenmp -fPIC -o libtmp.so tmp.c -lm': status 1 invoking 'gcc -shared -std=c11 -Ofast -march=native -fopenmp -fPIC -o libtmp.so tmp.c -lm': /data/software/gcc/12.1.0/libexec/gcc/x86_64-pc-linux-gnu/12.1.0/cc1: error while loading shared libraries: libmpfr.so.6: cannot open shared object file: No such file or directory

[hp038:63038] 6 more processes have sent help message help-mpi-api.txt / mpi-abort

And the opening part goes like this

Traceback (most recent call last):
  File "/data/home/sup/pyfr2/bin/pyfr", line 8, in <module>
Traceback (most recent call last):
  File "/data/home/sup/pyfr2/bin/pyfr", line 8, in <module>
    sys.exit(main())
             ^^^^^^
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pyfr/__main__.py", line 124, in main
    args.process(args)
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pyfr/__main__.py", line 258, in process_run
    _process_common(
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pyfr/__main__.py", line 243, in _process_common
    solver = get_solver(backend, rallocs, mesh, soln, cfg)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pyfr/solvers/__init__.py", line 14, in get_solver
    return get_integrator(backend, systemcls, rallocs, mesh, initsoln, cfg)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    sys.exit(main())
             ^^^^^^
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pyfr/__main__.py", line 124, in main
    args.process(args)
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pyfr/__main__.py", line 258, in process_run
    _process_common(
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pyfr/__main__.py", line 243, in _process_common
    solver = get_solver(backend, rallocs, mesh, soln, cfg)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pyfr/solvers/__init__.py", line 14, in get_solver
    return get_integrator(backend, systemcls, rallocs, mesh, initsoln, cfg)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pyfr/integrators/__init__.py", line 34, in get_integrator
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pyfr/integrators/__init__.py", line 34, in get_integrator
    return integrator(backend, systemcls, rallocs, mesh, initsoln, cfg)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pyfr/integrators/std/controllers.py", line 11, in __init__
    return integrator(backend, systemcls, rallocs, mesh, initsoln, cfg)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pyfr/integrators/std/controllers.py", line 11, in __init__
    super().__init__(*args, **kwargs)
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pyfr/integrators/std/base.py", line 26, in __init__
    super().__init__(*args, **kwargs)
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pyfr/integrators/std/base.py", line 26, in __init__
    self.system.commit()
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pyfr/solvers/base/system.py", line 67, in commit
    self.system.commit()
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pyfr/solvers/base/system.py", line 67, in commit
    self._gen_kernels(self.nregs, self.ele_map.values(), self._int_inters,
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pyfr/solvers/base/system.py", line 213, in _gen_kernels
    self._gen_kernels(self.nregs, self.ele_map.values(), self._int_inters,
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pyfr/solvers/base/system.py", line 213, in _gen_kernels
    kern = kgetter()
           ^^^^^^^^^
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pyfr/solvers/baseadvecdiff/elements.py", line 74, in <lambda>
    kern = kgetter()
           ^^^^^^^^^
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pyfr/solvers/baseadvecdiff/elements.py", line 74, in <lambda>
    kernels['gradcoru_u'] = lambda: slicedk(k() for k in gradcoru_u)
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pyfr/solvers/base/elements.py", line 179, in _make_sliced_kernel
    kernels['gradcoru_u'] = lambda: slicedk(k() for k in gradcoru_u)
                                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pyfr/solvers/base/elements.py", line 179, in _make_sliced_kernel
    klist = list(kseq)
            ^^^^^^^^^^
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pyfr/solvers/baseadvecdiff/elements.py", line 74, in <genexpr>
    kernels['gradcoru_u'] = lambda: slicedk(k() for k in gradcoru_u)
                                            ^^^
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pyfr/solvers/baseadvecdiff/elements.py", line 59, in <lambda>
    gradcoru_u.append(lambda: kernel(
                              ^^^^^^^
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pyfr/backends/base/backend.py", line 196, in kernel
    klist = list(kseq)
            ^^^^^^^^^^
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pyfr/solvers/baseadvecdiff/elements.py", line 74, in <genexpr>
    kernels['gradcoru_u'] = lambda: slicedk(k() for k in gradcoru_u)
                                            ^^^
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pyfr/solvers/baseadvecdiff/elements.py", line 59, in <lambda>
    gradcoru_u.append(lambda: kernel(
                              ^^^^^^^
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pyfr/backends/base/backend.py", line 196, in kernel
    kern = kern_meth(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pyfr/backends/base/kernels.py", line 177, in kernel_meth
    kern = kern_meth(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pyfr/backends/base/kernels.py", line 177, in kernel_meth
    fun = self._build_kernel(name, src, list(it.chain(*argt)), argn)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pyfr/backends/openmp/provider.py", line 149, in _build_kernel
    fun = self._build_kernel(name, src, list(it.chain(*argt)), argn)
          ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pyfr/backends/openmp/provider.py", line 149, in _build_kernel
    lib = self._build_library(src)
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pyfr/util.py", line 38, in newmeth
    lib = self._build_library(src)
          ^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pyfr/util.py", line 38, in newmeth
    res = cache[key] = meth(self, *args, **kwargs)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pyfr/backends/openmp/provider.py", line 141, in _build_library
    return self.backend.compiler.build(src)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pyfr/backends/openmp/compiler.py", line 58, in build
    res = cache[key] = meth(self, *args, **kwargs)
                       ^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/data/home/sup/pyfr2/lib/python3.11/site-packages/pyfr/backends/openmp/provider.py", line 141, in _build_library
    return self.backend.compiler.build(src)

Your compiler is broken. From the error message:

/data/software/gcc/12.1.0/libexec/gcc/x86_64-pc-linux-gnu/12.1.0/cc1: error while loading shared libraries: libmpfr.so.6: cannot open shared object file: No such file or directory

it is very clear that gcc is failing to run on the nodes due to a missing library.

Regards, Freddie.

Thanks, I have fixed it.

But when I used two nodes, the computation is performed only at the first node. Every node has 64 cpus (Intel(R)Xeon(R) Gold 6149 CPU ) and 4 sockets. The number of mesh partition is 8, and I used 2 nodes, i.e., 128 cpus. So OMP_NUM_THREADS=16 mpiexec -n 8 pyfr run -b openmp a.pyfrm b.ini is used. Am I rhght?

The specifics of this depend on your MPI library. You will need to consult its documentation for how to assign a certain number of cores per rank. Alternatively, just repartition into 128 pieces and use only a single thread (which will bypass the issue entirely).

Regards, Freddie.

When I used two nodes, I repartition into 128 pieces and use only a single thread, but the error comes out. It used to work, but then it did. Did you know why this happened?

WARNING: Open MPI failed to TCP connect to a peer MPI process. This
should not happen.

Your Open MPI job may now hang or fail.

Local host: k0108
PID: 265014
Message: connect() to 192.168.122.1:1046 failed
Error: Operation now in progress (115)

This is clearly an MPI problem. We are the authors of PyFR, not OpenMPI. You will need to follow up with your system administrator and/or the OpenMPI developers to get this issue resolved.

Regards, Freddie.