I want to save view matrices from the first step to be used in the following computation. I want to use copy kernel to do this job but before that, I need allocate corresponding memory for that via backend.matrix(shape, tag…). But the dimension of the view matrices is confusing to me. Could you tell me how are those data arranged? Thanks.
Swap out the view for an exchange view. Then run a packing kernel to pack the contents of the view into a buffer (the buffer being automatically allocated and managed by the exchange view).
There should be no need to copy the view indices themselves.
Is it something like this part of code (I copied it from baseacvec/interp.py)? Could I understand that self._scal_lhs is the left hand side view and self._scal_rhs is a copy of that.
# Generate the left hand view matrix and its dual
self._scal_lhs = self._scal_xchg_view(lhs, 'get_scal_fpts_for_inter')
self._scal_rhs = be.xchg_matrix_for_view(self._scal_lhs)
self._pnorm_lhs = self._const_mat(lhs, 'get_pnorms_for_inter')
# Kernels
self.kernels['scal_fpts_pack'] = lambda: be.kernel(
'pack', self._scal_lhs
)
self.kernels['scal_fpts_unpack'] = lambda: be.kernel(
'unpack', self._scal_rhs
)
And what does pack and unpack kernels really do? What are functions of these two kernels?
Packing is the process where the values pointed to by a view are copied into a buffer (and then that buffer is copied back to the host). The matrix is a member of self._scal_lhs.
Unpacking is the process of copying data from the host to the device (but no kernel is run here).
I recently restart to work on this. Just to clarify things we discussed: I did the following things to save the view from t = 0 for the t > 0 calculations:
# Swap out the view and pack it
self._sbase_lhs = self._scal_xchg_view(lhs, 'get_scal_fpts_for_inter')
# Kernels
self.kernels['lhs_fpts_pack'] = lambda: be.kernel(
'pack', self._sbase_lhs
)
Then this packing kernel is run which copy the view values to xchgmat matrix which is a member of self._sbase_lhs. In the future time steps, this matrix is passed to some pointwise kernel, for example:
Am I right for all those procedures? I tried to check the result of ulin[0] - ublin[0] at the t = 0 inside this pointwise kernel, but the value is not 0. And if I pass ublin=self._sbase_lhs into the kernel, the result is 0. Do you have any advice regarding that?
First, check that the arguments ublin and ubrin are specified with the mpi prefix in the kernel definition. Second, are you sure that the packing kernel is running and completing before your kernel runs?
Ah yes mpi prefix works. Thanks. I think prefixes view, mpi and no prefix give the different mapping? Regarding the second point, the packing kernel is run right after the kernel eles/disu.
Yes, these prefixes determine how the kernel accesses the data. MPI is needed for anything that has been packed. It should probably be changed to xchg so it matches the name of the underlying data types.
One more question on mpi view matrix to help to understand:
in this block of code, there are kernels doing pack, send, recv and unpack:
# Generate second set of view matrices
self._vect_lhs = self._vect_xchg_view(lhs, 'get_vect_fpts_for_inter')
self._vect_rhs = be.xchg_matrix_for_view(self._vect_lhs)
# If we need to send our gradients to the RHS
if self.c['ldg-beta'] != -0.5:
self.kernels['vect_fpts_pack'] = lambda: be.kernel(
'pack', self._vect_lhs
)
self.mpireqs['vect_fpts_send'] = lambda: self._vect_lhs.sendreq(
self._rhsrank, vect_fpts_tag
)
# If we need to recv gradients from the RHS
if self.c['ldg-beta'] != 0.5:
self.mpireqs['vect_fpts_recv'] = lambda: self._vect_rhs.recvreq(
self._rhsrank, vect_fpts_tag
)
self.kernels['vect_fpts_unpack'] = lambda: be.kernel(
'unpack', self._vect_rhs
)
And in the user defined pre-process graphs:
g2 = self.backend.graph()
g2.add_mpi_reqs(m['vect_fpts_recv'])
# Pack and send these interpolated gradients to our neighbours
g2.add_all(k['mpiint/vect_fpts_pack'], deps=ideps)
for send, pack in zip(m['vect_fpts_send'], k['mpiint/vect_fpts_pack']):
g2.add_mpi_req(send, deps=[pack])
g2.add_all(k['mpiint/vect_fpts_unpack'])
g2.commit()
Is it sufficient to let this view matrix to pass through to another rank and be used in the run-time graph? You said before ‘Unpacking is the process of copying data from the host to the device (but no kernel is run here).’ what does it mean no kernel is run?
And also does packing really copy data from buffer to the host even if in the device-awared mpi mode?
Packing on the send side always involves executing a kernel to collect data together on a buffer on the backend. Depending on the backend (and other configuration options such as CUDA aware MPI) this buffer may then be copied over to the host. Unpacking never involves a kernel but may involve a copy operation. This again depends on the backend and how it has been configured.