Hi Freddie,
One more question on mpi view matrix to help to understand:
in this block of code, there are kernels doing pack, send, recv and unpack:
# Generate second set of view matrices
self._vect_lhs = self._vect_xchg_view(lhs, 'get_vect_fpts_for_inter')
self._vect_rhs = be.xchg_matrix_for_view(self._vect_lhs)
# If we need to send our gradients to the RHS
if self.c['ldg-beta'] != -0.5:
self.kernels['vect_fpts_pack'] = lambda: be.kernel(
'pack', self._vect_lhs
)
self.mpireqs['vect_fpts_send'] = lambda: self._vect_lhs.sendreq(
self._rhsrank, vect_fpts_tag
)
# If we need to recv gradients from the RHS
if self.c['ldg-beta'] != 0.5:
self.mpireqs['vect_fpts_recv'] = lambda: self._vect_rhs.recvreq(
self._rhsrank, vect_fpts_tag
)
self.kernels['vect_fpts_unpack'] = lambda: be.kernel(
'unpack', self._vect_rhs
)
And in the user defined pre-process graphs:
g2 = self.backend.graph()
g2.add_mpi_reqs(m['vect_fpts_recv'])
# Pack and send these interpolated gradients to our neighbours
g2.add_all(k['mpiint/vect_fpts_pack'], deps=ideps)
for send, pack in zip(m['vect_fpts_send'], k['mpiint/vect_fpts_pack']):
g2.add_mpi_req(send, deps=[pack])
g2.add_all(k['mpiint/vect_fpts_unpack'])
g2.commit()
Is it sufficient to let this view matrix to pass through to another rank and be used in the run-time graph? You said before ‘Unpacking is the process of copying data from the host to the device (but no kernel is run here).’ what does it mean no kernel is run?
And also does packing really copy data from buffer to the host even if in the device-awared mpi mode?
Regards,
Zhenyang