Errors about using mpiexe with CUDA backend on GPUs

Dear all,

I was attempting to run a simulation on Ubuntu18.04 with 4 Nvidia 2080Ti GPUs. I succeed with the CUDA backend on one GPU. But when I turned to the combination of parallelism and partition, running with mpiexec -n 4 pyfr run -b cuda -p ..., the error came out accidentally:

[b44093f3a7f1:03757] Read -1, expected 87040, errno = 1
[b44093f3a7f1:03758] Read -1, expected 87040, errno = 1
[b44093f3a7f1:03757] *** Process received signal ***
[b44093f3a7f1:03758] *** Process received signal ***
[b44093f3a7f1:03758] Signal: Segmentation fault (11)
[b44093f3a7f1:03758] Signal code: Invalid permissions (2)
[b44093f3a7f1:03758] Failing at address: 0x7fef6749fe00
[b44093f3a7f1:03757] Signal: Segmentation fault (11)
[b44093f3a7f1:03757] Signal code: Invalid permissions (2)
[b44093f3a7f1:03757] Failing at address: 0x7f47375c7200
[b44093f3a7f1:03758] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x128a0)[0x7ff01a8fb8a0]
[b44093f3a7f1:03758] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x18ed8f)[0x7ff01a686d8f]
[b44093f3a7f1:03758] [ 2] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_btl_vader.so(+0x2ca8)[0x7fefc1c61ca8]
[b44093f3a7f1:03758] [ 3] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send_request_schedule_once+0x1aa)[0x7fefc0c022fa]
[b44093f3a7f1:03758] [ 4] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback_ack+0x2af)[0x7fefc0bf9b6f]
[b44093f3a7f1:03758] [ 5] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_btl_vader.so(mca_btl_vader_poll_handle_frag+0x8f)[0x7fefc1c6351f]
[b44093f3a7f1:03758] [ 6] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_btl_vader.so(+0x482e)[0x7fefc1c6382e]
[b44093f3a7f1:03758] [ 7] /usr/lib/x86_64-linux-gnu/libopen-pal.so.20(opal_progress+0x5c)[0x7ff0159439ec]
[b44093f3a7f1:03758] [ 8] /usr/lib/x86_64-linux-gnu/libmpi.so.20(ompi_request_default_wait_all+0x2e5)[0x7ff015e9d3f5]
[b44093f3a7f1:03758] [ 9] /usr/lib/x86_64-linux-gnu/libmpi.so.20(PMPI_Waitall+0x8f)[0x7ff015ed462f]
[b44093f3a7f1:03758] [10] /root/Desktop/pyfr/pyfr_venv/lib/python3.8/site-packages/mpi4py/MPI.cpython-38-x86_64-linux-gnu.so(+0x75f45)[0x7ff0161d1f45]
[b44093f3a7f1:03758] [11] /root/Desktop/pyfr/pyfr_venv/bin/python(PyCFunction_Call+0x56)[0x557c7dbccf76]
[b44093f3a7f1:03758] [12] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyObject_MakeTpCall+0x22f)[0x557c7db8a85f]
[b44093f3a7f1:03758] [13] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyEval_EvalFrameDefault+0x4596)[0x557c7dc11f56]
[b44093f3a7f1:03758] [14] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x557c7dbd886b]
[b44093f3a7f1:03758] [15] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10075e)[0x557c7db4d75e]
[b44093f3a7f1:03758] [16] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x557c7dbd886b]
[b44093f3a7f1:03758] [17] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10075e)[0x557c7db4d75e]
[b44093f3a7f1:03758] [18] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x557c7dbd886b]
[b44093f3a7f1:03758] [19] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10077f)[0x557c7db4d77f]
[b44093f3a7f1:03758] [20] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x18bc0b)[0x557c7dbd8c0b]
[b44093f3a7f1:03758] [21] /root/Desktop/pyfr/pyfr_venv/bin/python(+0xfeb84)[0x557c7db4bb84]
[b44093f3a7f1:03758] [22] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x557c7dbd886b]
[b44093f3a7f1:03758] [23] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10075e)[0x557c7db4d75e]
[b44093f3a7f1:03758] [24] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyEval_EvalCodeWithName+0x2d2)[0x557c7dbd7a92]
[b44093f3a7f1:03758] [25] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x18bd20)[0x557c7dbd8d20]
[b44093f3a7f1:03758] [26] /root/Desktop/pyfr/pyfr_venv/bin/python(+0xfeb84)[0x557c7db4bb84]
[b44093f3a7f1:03758] [27] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x557c7dbd886b]
[b44093f3a7f1:03758] [28] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10075e)[0x557c7db4d75e]
[b44093f3a7f1:03758] [29] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x557c7dbd886b]
[b44093f3a7f1:03758] *** End of error message ***
[b44093f3a7f1:03757] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x128a0)[0x7f47eca598a0]
[b44093f3a7f1:03757] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x18ed8f)[0x7f47ec7e4d8f]
[b44093f3a7f1:03757] [ 2] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_btl_vader.so(+0x2ca8)[0x7f47ba206ca8]
[b44093f3a7f1:03757] [ 3] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send_request_schedule_once+0x1aa)[0x7f47b91a72fa]
[b44093f3a7f1:03757] [ 4] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback_ack+0x2af)[0x7f47b919eb6f]
[b44093f3a7f1:03757] [ 5] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_btl_vader.so(mca_btl_vader_poll_handle_frag+0x8f)[0x7f47ba20851f]
[b44093f3a7f1:03757] [ 6] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_btl_vader.so(+0x4884)[0x7f47ba208884]
[b44093f3a7f1:03757] [ 7] /usr/lib/x86_64-linux-gnu/libopen-pal.so.20(opal_progress+0x5c)[0x7f47e7aa19ec]
[b44093f3a7f1:03757] [ 8] /usr/lib/x86_64-linux-gnu/libmpi.so.20(ompi_request_default_wait_all+0x2e5)[0x7f47e7ffb3f5]
[b44093f3a7f1:03757] [ 9] /usr/lib/x86_64-linux-gnu/libmpi.so.20(PMPI_Waitall+0x8f)[0x7f47e803262f]
[b44093f3a7f1:03757] [10] /root/Desktop/pyfr/pyfr_venv/lib/python3.8/site-packages/mpi4py/MPI.cpython-38-x86_64-linux-gnu.so(+0x75f45)[0x7f47e832ff45]
[b44093f3a7f1:03757] [11] /root/Desktop/pyfr/pyfr_venv/bin/python(PyCFunction_Call+0x56)[0x560c3eebaf76]
[b44093f3a7f1:03757] [12] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyObject_MakeTpCall+0x22f)[0x560c3ee7885f]
[b44093f3a7f1:03757] [13] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyEval_EvalFrameDefault+0x4596)[0x560c3eefff56]
[b44093f3a7f1:03757] [14] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x560c3eec686b]
[b44093f3a7f1:03757] [15] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10075e)[0x560c3ee3b75e]
[b44093f3a7f1:03757] [16] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x560c3eec686b]
[b44093f3a7f1:03757] [17] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10075e)[0x560c3ee3b75e]
[b44093f3a7f1:03757] [18] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x560c3eec686b]
[b44093f3a7f1:03757] [19] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10077f)[0x560c3ee3b77f]
[b44093f3a7f1:03757] [20] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x18bc0b)[0x560c3eec6c0b]
[b44093f3a7f1:03757] [21] /root/Desktop/pyfr/pyfr_venv/bin/python(+0xfeb84)[0x560c3ee39b84]
[b44093f3a7f1:03757] [22] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x560c3eec686b]
[b44093f3a7f1:03757] [23] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10075e)[0x560c3ee3b75e]
[b44093f3a7f1:03757] [24] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyEval_EvalCodeWithName+0x2d2)[0x560c3eec5a92]
[b44093f3a7f1:03757] [25] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x18bd20)[0x560c3eec6d20]
[b44093f3a7f1:03757] [26] /root/Desktop/pyfr/pyfr_venv/bin/python(+0xfeb84)[0x560c3ee39b84]
[b44093f3a7f1:03757] [27] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x560c3eec686b]
[b44093f3a7f1:03757] [28] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10075e)[0x560c3ee3b75e]
[b44093f3a7f1:03757] [29] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x560c3eec686b]
[b44093f3a7f1:03757] *** End of error message ***
[b44093f3a7f1:03759] Read -1, expected 97792, errno = 1
[b44093f3a7f1:03759] Read -1, expected 151552, errno = 1
[b44093f3a7f1:03759] Read -1, expected 99840, errno = 1
[b44093f3a7f1:03756] Read -1, expected 51200, errno = 1
[b44093f3a7f1:03756] Read -1, expected 175104, errno = 1
[b44093f3a7f1:03756] Read -1, expected 99840, errno = 1
[b44093f3a7f1:03756] *** Process received signal ***
[b44093f3a7f1:03756] Signal: Segmentation fault (11)
[b44093f3a7f1:03756] Signal code: Invalid permissions (2)
[b44093f3a7f1:03756] Failing at address: 0x7f52c7c4b000
[b44093f3a7f1:03759] *** Process received signal ***
[b44093f3a7f1:03759] Signal: Segmentation fault (11)
[b44093f3a7f1:03759] Signal code: Invalid permissions (2)
[b44093f3a7f1:03759] Failing at address: 0x7fc283426c00
[b44093f3a7f1:03756] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x128a0)[0x7f53adb4e8a0]
[b44093f3a7f1:03756] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x18ed8f)[0x7f53ad8d9d8f]
[b44093f3a7f1:03756] [ 2] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_btl_vader.so(+0x2ca8)[0x7f53808f9ca8]
[b44093f3a7f1:03759] [ 0] [b44093f3a7f1:03756] [ 3] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send_request_schedule_once+0x1aa)[0x/lib/x86_64-linux-gnu/libpthread.so.0(+0x128a0)[0x7fc334b408a0]
[b44093f3a7f1:03759] [ 1] 7f537d6ac2fa]
[b44093f3a7f1:03756] [ 4] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback_ack+0x2af)[0x7f537d6a3b6f]
[b44093f3a7f1:03756] [ 5] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_btl_vader.so(mca_btl_vader_poll_handle_frag+0x8f)[0x7f53808fb51f]
[b44093f3a7f1:03756] [ 6] /lib/x86_64-linux-gnu/libc.so.6(+0x18ed8f)[0x7fc3348cbd8f]
[b44093f3a7f1:03759] [ 2] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_btl_vader.so(+0x/usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_btl_vader.so(+0x482e)[0x7f53808fb82e]
[b44093f3a7f1:03756] [ 7] 2ca8)[0x7fc320fb5ca8]
[b44093f3a7f1:03759] [ 3] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send_request_schedule_once+0x1aa)[0x7fc2d6f9d2fa/usr/lib/x86_64-linux-gnu/libopen-pal.so.20(opal_progress+0x5c)[0x7f53a8b969ec]
[b44093f3a7f1:03756] [ 8] ]
[b44093f3a7f1:03759] [ 4] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback_ack+0x2af)[0x7fc2d6f94b6f]
/usr/lib/x86_64-linux-gnu/libmpi.so.20(ompi_request_default_wait_all+0x2e5)[0x7f53a90f03f5]
[b44093f3a7f1:03759] [ 5] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_btl_vader.so(mca_btl_vader_poll_handle_frag+0x8f)[0x7fc320fb751f]
[b44093f3a7f1:03759] [ 6] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_btl_vader.so(+0x482e[b44093f3a7f1:03756] [ 9] )[0x7fc320fb782e]
[b44093f3a7f1:03759] [ 7] /usr/lib/x86_64-linux-gnu/libmpi.so.20(PMPI_Waitall+0x8f)[0x7f53a912762f]
[b44093f3a7f1:03756] /usr/lib/x86_64-linux-gnu/libopen-pal.so.20(opal_progress+0x5c)[0x7fc32fb889ec]
[b44093f3a7f1:03759] [ 8] [10] /root/Desktop/pyfr/pyfr_venv/lib/python3.8/site-packages/mpi4py/MPI.cpython-38-x86_64-linux-gnu.so(+0x75f45)[0x7f53a9424f45]
[b44093f3a7f1:03756] [11] /usr/lib/x86_64-linux-gnu/libmpi.so.20(ompi_request_default_wait_all+0x2e5)[0x7fc3300e23f5]
[b44093f3a7f1:03759] [ 9] /root/Desktop/pyfr/pyfr_venv/bin/python(PyCFunction_Call+0x56)[0x55b44d55bf76]
[b44093f3a7f1:03756] [12] /usr/lib/x86_64-linux-gnu/libmpi.so.20(PMPI_Waitall+0x8f)[0x7fc33011962f]
[b44093f3a7f1:03759] [10] /root/Desktop/pyfr/pyfr_venv/lib/python3.8/site-packages/mpi4py/MPI.cpython-38-x86_64-linux-gnu.so(+0x75f45)[0x7fc330416f45]
[b44093f3a7f1:03759] [11] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyObject_MakeTpCall+0x22f)[0x55b44d51985f]
[b44093f3a7f1:03756] [13] /root/Desktop/pyfr/pyfr_venv/bin/python(PyCFunction_Call+0x56)[0x56330a557f76]
[b44093f3a7f1:03759] [12] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyEval_EvalFrameDefault+0x4596)[0x55b44d5a0f56]
[b44093f3a7f1:03756] [14] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyObject_MakeTpCall+0x22f)[0x56330a51585f]
[b44093f3a7f1:03759] [13] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x55b44d56786b]
[b44093f3a7f1:03756] [15] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyEval_EvalFrameDefault+0x4596)[0x56330a59cf56]
[b44093f3a7f1:03759] [14] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10075e)[0x55b44d4dc75e]
[b44093f3a7f1:03756] [16] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x56330a56386b]
[b44093f3a7f1:03759] [15] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x55b44d56786b]
[b44093f3a7f1:03756] [17] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10075e)[0x56330a4d875e]
[b44093f3a7f1:03759] [16] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10075e)[0x55b44d4dc75e]
[b44093f3a7f1:03756] [18] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x56330a56386b]
[b44093f3a7f1:03759] [17] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x55b44d56786b]
[b44093f3a7f1:03756] [19] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10075e)[0x56330a4d875e]
[b44093f3a7f1:03759] [18] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10077f)[0x55b44d4dc77f]
[b44093f3a7f1:03756] [20] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x56330a56386b]
/root/Desktop/pyfr/pyfr_venv/bin/python(+0x18bc0b)[0x55b44d567c0b]
[b44093f3a7f1:03756] [21] [b44093f3a7f1:03759] [19] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10077f)[0x56330a4d877f]
[b44093f3a7f1:03759] [20] /root/Desktop/pyfr/pyfr_venv/bin/python(+0xfeb84)[0x55b44d4dab84]
[b44093f3a7f1:03756] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x18bc0b)[0x56330a563c0b]
[b44093f3a7f1:03759] [22] [21] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x55b44d56786b]
[b44093f3a7f1:03756] [23] /root/Desktop/pyfr/pyfr_venv/bin/python(+0xfeb84)[0x56330a4d6b84]
[b44093f3a7f1:03759] [22] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10075e)[0x55b44d4dc75e]
[b44093f3a7f1:03756] [24] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x56330a56386b]
[b44093f3a7f1:03759] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyEval_EvalCodeWithName+0x2d2)[0x55b44d566a92]
[b44093f3a7f1:03756] [25] [23] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10075e)[0x56330a4d875e]
[b44093f3a7f1:03759] [24] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x18bd20)[0x55b44d567d20]
[b44093f3a7f1:03756] [26] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyEval_EvalCodeWithName+0x2d2)[0x56330a562a92]
[b44093f3a7f1:03759] [25] /root/Desktop/pyfr/pyfr_venv/bin/python(+0xfeb84)[0x55b44d4dab84]
[b44093f3a7f1:03756] [27] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x18bd20)[0x56330a563d20]
[b44093f3a7f1:03759] [26] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x55b44d56786b]
/root/Desktop/pyfr/pyfr_venv/bin/python(+0xfeb84)[0x56330a4d6b84]
[b44093f3a7f1:03759] [27] [b44093f3a7f1:03756] [28] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x56330a56386b]
[b44093f3a7f1:03759] [28] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10075e)[0x55b44d4dc75e]
[b44093f3a7f1:03756] [29] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10075e)[0x56330a4d875e]
[b44093f3a7f1:03759] [29] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x55b44d56786b]
[b44093f3a7f1:03756] *** End of error message ***
/root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x56330a56386b]
[b44093f3a7f1:03759] *** End of error message ***
/root/Desktop/pyfr/pyfr_venv/lib/python3.8/site-packages/pytools/prefork.py:94: UserWarning: Prefork server exiting upon apparent death of parent
  warn(f"{who} exiting upon apparent death of {partner}")
/root/Desktop/pyfr/pyfr_venv/lib/python3.8/site-packages/pytools/prefork.py:94: UserWarning: Prefork server exiting upon apparent death of parent
  warn(f"{who} exiting upon apparent death of {partner}")
/root/Desktop/pyfr/pyfr_venv/lib/python3.8/site-packages/pytools/prefork.py:94: UserWarning: Prefork server exiting upon apparent death of parent
  warn(f"{who} exiting upon apparent death of {partner}")
/root/Desktop/pyfr/pyfr_venv/lib/python3.8/site-packages/pytools/prefork.py:94: UserWarning: Prefork server exiting upon apparent death of parent
  warn(f"{who} exiting upon apparent death of {partner}")
--------------------------------------------------------------------------
mpiexec noticed that process rank 1 with PID 0 on node b44093f3a7f1 exited on signal 11 (Segmentation fault).

It seemes something was wrong with the allocation of the GPUs. I set the following lines in the config file to make sure the each Gpu had a rank.

[backend-cuda]
device-id = local-rank
mpi-type = cuda-aware
block-1d = 64
block-2d = 128

Hope someone can offer me any hints. BTW, I’ve linked the directory of libmetis/libcuda etc. to the PATH already.

Regards, Thatcher

Can you confirm that your MPI library is indeed CUDA aware, and what your rationale is from changing the default value of mpi-type?

Regards, Freddie.

Thanks for the prompt reply, and I did not notice the mpi-type was changed from the default value. :joy: So I changed the value back to the standard, and I can tell from nvidia-smi that all the GPUs start to work. But then another error came as following and kept scrolling.

......
[c84b168a83d9:07273] Read -1, expected 930816, errno = 1
[c84b168a83d9:07274] Read -1, expected 310272, errno = 1
[c84b168a83d9:07273] Read -1, expected 310272, errno = 1
[c84b168a83d9:07273] Read -1, expected 930816, errno = 1
[c84b168a83d9:07274] Read -1, expected 310272, errno = 1
[c84b168a83d9:07273] Read -1, expected 310272, errno = 1
[c84b168a83d9:07273] Read -1, expected 930816, errno = 1
[c84b168a83d9:07274] Read -1, expected 310272, errno = 1
[c84b168a83d9:07273] Read -1, expected 310272, errno = 1
[c84b168a83d9:07273] Read -1, expected 930816, errno = 1
   2.0% [>                            ] 5.20/15.00 ela: 00:00:18 rem: 248:46:16[c84b168a83d9:07274] Read -1, expected 310272, errno = 1
[c84b168a83d9:07273] Read -1, expected 310272, errno = 1
[c84b168a83d9:07273] Read -1, expected 930816, errno = 1
[c84b168a83d9:07274] Read -1, expected 310272, errno = 1
[c84b168a83d9:07273] Read -1, expected 310272, errno = 1
[c84b168a83d9:07273] Read -1, expected 930816, errno = 1
[c84b168a83d9:07274] Read -1, expected 310272, errno = 1
[c84b168a83d9:07273] Read -1, expected 310272, errno = 1
[c84b168a83d9:07273] Read -1, expected 930816, errno = 1
......

Regards, Thatcher

This looks like an MPI library issue. Please try with a different MPI library (OpenMPI or MPICH built from source usually work best.) Once compiled and installed you’ll want to be sure to recompile mpi4py against this new library, otherwise you may encounter strange issues.

Regards, Freddie.