Hi Zach,
I was trying to run the same case with the OpenCL backend--I know,
I know it's a bit more tenuous--and the simulation progress reached
98% before there was a pyopencl error which killed the MPI rank I
had allocated for the GPU:
(venv) [zdavis@Rahvin cubes]$ mpirun -np 5 ./launcher.sh
cube_hex24.pyfrm cube.ini
99.8% [==============================> ] 0.10/0.10 ela: 00:07:04
rem: 00:00:00Traceback (most recent call last):
File "/Users/zdavis/Applications/PyFR/pyfr/scripts/pyfr", line 38,
in <module>
main()
File "/Users/zdavis/Applications/PyFR/pyfr/scripts/pyfr", line 32,
in main
args.process(args)
File "/Users/zdavis/Applications/PyFR/pyfr/scripts/sim.py", line
87, in process_run
args, read_pyfr_data(args.mesh), None, Inifile.load(args.cfg)
File
"/Users/zdavis/Applications/PyFR/venv/lib/python3.4/site-packages/mpmath/ctx_mp.py",
line 1301, in g
return f(*args, **kwargs)
File "/Users/zdavis/Applications/PyFR/pyfr/scripts/sim.py", line
59, in _process_common
solver.run()
File "/Users/zdavis/Applications/PyFR/pyfr/integrators/base.py",
line 112, in run
solns = self.advance_to(t)
File
"/Users/zdavis/Applications/PyFR/pyfr/integrators/controllers.py",
line 79, in advance_to
idxcurr = self.step(self.tcurr, dt)
File
"/Users/zdavis/Applications/PyFR/pyfr/integrators/steppers.py",
line 154, in step
rhs(t + dt/2.0, r2, r2)
File
"/Users/zdavis/Applications/PyFR/pyfr/solvers/baseadvecdiff/system.py",
line 57, in rhs
runall([q1])
File
"/Users/zdavis/Applications/PyFR/pyfr/backends/base/backend.py",
line 183, in runall
self.queue_cls.runall(sequence)
File
"/Users/zdavis/Applications/PyFR/pyfr/backends/opencl/types.py",
line 114, in runall
q._exec_nowait()
File
"/Users/zdavis/Applications/PyFR/pyfr/backends/base/types.py", line
303, in _exec_nowait
self._exec_item(*self._items.popleft())
File
"/Users/zdavis/Applications/PyFR/pyfr/backends/base/types.py", line
288, in _exec_item
item.run(self, *args, **kwargs)
File
"/Users/zdavis/Applications/PyFR/pyfr/backends/opencl/provider.py",
line 38, in run
fun(queue.cl_queue_comp, (dims[-1],), None, *narglst)
File
"/Users/zdavis/Applications/PyFR/venv/lib/python3.4/site-packages/pyopencl/__init__.py",
line 509, in kernel_call
self.set_args(*args)
File
"/Users/zdavis/Applications/PyFR/venv/lib/python3.4/site-packages/pyopencl/__init__.py",
line 549, in kernel_set_args
self.set_arg(i, pack(arg_type_char, arg))
struct.error: required argument is not a float
--------------------------------------------------------------------------
MPI_ABORT was invoked on rank 0 in communicator MPI_COMM_WORLD
with errorcode 1.
NOTE: invoking MPI_ABORT causes Open MPI to kill all MPI
processes.
You may or may not see output from other processes, depending on
exactly when Open MPI kills them.
--------------------------------------------------------------------------
/Users/zdavis/Applications/PyFR/venv/lib/python3.4/site-packages/pytools/prefork.py:74:
UserWarning: Prefork server exiting upon apparent death of parent
warn("%s exiting upon apparent death of %s" % (who, partner))
Any ideas where this might be coming from? Is it within PyFR, or
rather something wrong with the pyopencl package?
I have my suspicions. Around here:
<https://github.com/vincentlab/PyFR/blob/develop/pyfr/backends/opencl/provider.py#L38>
can you do:
try:
fun(queue.cl_queue_comp, (dims[-1],), None, *narglst)
except:
print([a.__class__ for a in narglst])
raise
so the plan is to output the types of all of the variables passed.
- From here we can figure out which float is not actually a float.
Regards, Freddie.