Dear all,
I was attempting to run a simulation on Ubuntu18.04 with 4 Nvidia 2080Ti GPUs. I succeed with the CUDA backend on one GPU. But when I turned to the combination of parallelism and partition, running with mpiexec -n 4 pyfr run -b cuda -p ...
, the error came out accidentally:
[b44093f3a7f1:03757] Read -1, expected 87040, errno = 1
[b44093f3a7f1:03758] Read -1, expected 87040, errno = 1
[b44093f3a7f1:03757] *** Process received signal ***
[b44093f3a7f1:03758] *** Process received signal ***
[b44093f3a7f1:03758] Signal: Segmentation fault (11)
[b44093f3a7f1:03758] Signal code: Invalid permissions (2)
[b44093f3a7f1:03758] Failing at address: 0x7fef6749fe00
[b44093f3a7f1:03757] Signal: Segmentation fault (11)
[b44093f3a7f1:03757] Signal code: Invalid permissions (2)
[b44093f3a7f1:03757] Failing at address: 0x7f47375c7200
[b44093f3a7f1:03758] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x128a0)[0x7ff01a8fb8a0]
[b44093f3a7f1:03758] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x18ed8f)[0x7ff01a686d8f]
[b44093f3a7f1:03758] [ 2] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_btl_vader.so(+0x2ca8)[0x7fefc1c61ca8]
[b44093f3a7f1:03758] [ 3] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send_request_schedule_once+0x1aa)[0x7fefc0c022fa]
[b44093f3a7f1:03758] [ 4] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback_ack+0x2af)[0x7fefc0bf9b6f]
[b44093f3a7f1:03758] [ 5] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_btl_vader.so(mca_btl_vader_poll_handle_frag+0x8f)[0x7fefc1c6351f]
[b44093f3a7f1:03758] [ 6] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_btl_vader.so(+0x482e)[0x7fefc1c6382e]
[b44093f3a7f1:03758] [ 7] /usr/lib/x86_64-linux-gnu/libopen-pal.so.20(opal_progress+0x5c)[0x7ff0159439ec]
[b44093f3a7f1:03758] [ 8] /usr/lib/x86_64-linux-gnu/libmpi.so.20(ompi_request_default_wait_all+0x2e5)[0x7ff015e9d3f5]
[b44093f3a7f1:03758] [ 9] /usr/lib/x86_64-linux-gnu/libmpi.so.20(PMPI_Waitall+0x8f)[0x7ff015ed462f]
[b44093f3a7f1:03758] [10] /root/Desktop/pyfr/pyfr_venv/lib/python3.8/site-packages/mpi4py/MPI.cpython-38-x86_64-linux-gnu.so(+0x75f45)[0x7ff0161d1f45]
[b44093f3a7f1:03758] [11] /root/Desktop/pyfr/pyfr_venv/bin/python(PyCFunction_Call+0x56)[0x557c7dbccf76]
[b44093f3a7f1:03758] [12] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyObject_MakeTpCall+0x22f)[0x557c7db8a85f]
[b44093f3a7f1:03758] [13] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyEval_EvalFrameDefault+0x4596)[0x557c7dc11f56]
[b44093f3a7f1:03758] [14] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x557c7dbd886b]
[b44093f3a7f1:03758] [15] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10075e)[0x557c7db4d75e]
[b44093f3a7f1:03758] [16] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x557c7dbd886b]
[b44093f3a7f1:03758] [17] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10075e)[0x557c7db4d75e]
[b44093f3a7f1:03758] [18] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x557c7dbd886b]
[b44093f3a7f1:03758] [19] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10077f)[0x557c7db4d77f]
[b44093f3a7f1:03758] [20] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x18bc0b)[0x557c7dbd8c0b]
[b44093f3a7f1:03758] [21] /root/Desktop/pyfr/pyfr_venv/bin/python(+0xfeb84)[0x557c7db4bb84]
[b44093f3a7f1:03758] [22] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x557c7dbd886b]
[b44093f3a7f1:03758] [23] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10075e)[0x557c7db4d75e]
[b44093f3a7f1:03758] [24] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyEval_EvalCodeWithName+0x2d2)[0x557c7dbd7a92]
[b44093f3a7f1:03758] [25] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x18bd20)[0x557c7dbd8d20]
[b44093f3a7f1:03758] [26] /root/Desktop/pyfr/pyfr_venv/bin/python(+0xfeb84)[0x557c7db4bb84]
[b44093f3a7f1:03758] [27] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x557c7dbd886b]
[b44093f3a7f1:03758] [28] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10075e)[0x557c7db4d75e]
[b44093f3a7f1:03758] [29] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x557c7dbd886b]
[b44093f3a7f1:03758] *** End of error message ***
[b44093f3a7f1:03757] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x128a0)[0x7f47eca598a0]
[b44093f3a7f1:03757] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x18ed8f)[0x7f47ec7e4d8f]
[b44093f3a7f1:03757] [ 2] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_btl_vader.so(+0x2ca8)[0x7f47ba206ca8]
[b44093f3a7f1:03757] [ 3] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send_request_schedule_once+0x1aa)[0x7f47b91a72fa]
[b44093f3a7f1:03757] [ 4] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback_ack+0x2af)[0x7f47b919eb6f]
[b44093f3a7f1:03757] [ 5] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_btl_vader.so(mca_btl_vader_poll_handle_frag+0x8f)[0x7f47ba20851f]
[b44093f3a7f1:03757] [ 6] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_btl_vader.so(+0x4884)[0x7f47ba208884]
[b44093f3a7f1:03757] [ 7] /usr/lib/x86_64-linux-gnu/libopen-pal.so.20(opal_progress+0x5c)[0x7f47e7aa19ec]
[b44093f3a7f1:03757] [ 8] /usr/lib/x86_64-linux-gnu/libmpi.so.20(ompi_request_default_wait_all+0x2e5)[0x7f47e7ffb3f5]
[b44093f3a7f1:03757] [ 9] /usr/lib/x86_64-linux-gnu/libmpi.so.20(PMPI_Waitall+0x8f)[0x7f47e803262f]
[b44093f3a7f1:03757] [10] /root/Desktop/pyfr/pyfr_venv/lib/python3.8/site-packages/mpi4py/MPI.cpython-38-x86_64-linux-gnu.so(+0x75f45)[0x7f47e832ff45]
[b44093f3a7f1:03757] [11] /root/Desktop/pyfr/pyfr_venv/bin/python(PyCFunction_Call+0x56)[0x560c3eebaf76]
[b44093f3a7f1:03757] [12] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyObject_MakeTpCall+0x22f)[0x560c3ee7885f]
[b44093f3a7f1:03757] [13] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyEval_EvalFrameDefault+0x4596)[0x560c3eefff56]
[b44093f3a7f1:03757] [14] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x560c3eec686b]
[b44093f3a7f1:03757] [15] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10075e)[0x560c3ee3b75e]
[b44093f3a7f1:03757] [16] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x560c3eec686b]
[b44093f3a7f1:03757] [17] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10075e)[0x560c3ee3b75e]
[b44093f3a7f1:03757] [18] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x560c3eec686b]
[b44093f3a7f1:03757] [19] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10077f)[0x560c3ee3b77f]
[b44093f3a7f1:03757] [20] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x18bc0b)[0x560c3eec6c0b]
[b44093f3a7f1:03757] [21] /root/Desktop/pyfr/pyfr_venv/bin/python(+0xfeb84)[0x560c3ee39b84]
[b44093f3a7f1:03757] [22] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x560c3eec686b]
[b44093f3a7f1:03757] [23] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10075e)[0x560c3ee3b75e]
[b44093f3a7f1:03757] [24] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyEval_EvalCodeWithName+0x2d2)[0x560c3eec5a92]
[b44093f3a7f1:03757] [25] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x18bd20)[0x560c3eec6d20]
[b44093f3a7f1:03757] [26] /root/Desktop/pyfr/pyfr_venv/bin/python(+0xfeb84)[0x560c3ee39b84]
[b44093f3a7f1:03757] [27] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x560c3eec686b]
[b44093f3a7f1:03757] [28] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10075e)[0x560c3ee3b75e]
[b44093f3a7f1:03757] [29] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x560c3eec686b]
[b44093f3a7f1:03757] *** End of error message ***
[b44093f3a7f1:03759] Read -1, expected 97792, errno = 1
[b44093f3a7f1:03759] Read -1, expected 151552, errno = 1
[b44093f3a7f1:03759] Read -1, expected 99840, errno = 1
[b44093f3a7f1:03756] Read -1, expected 51200, errno = 1
[b44093f3a7f1:03756] Read -1, expected 175104, errno = 1
[b44093f3a7f1:03756] Read -1, expected 99840, errno = 1
[b44093f3a7f1:03756] *** Process received signal ***
[b44093f3a7f1:03756] Signal: Segmentation fault (11)
[b44093f3a7f1:03756] Signal code: Invalid permissions (2)
[b44093f3a7f1:03756] Failing at address: 0x7f52c7c4b000
[b44093f3a7f1:03759] *** Process received signal ***
[b44093f3a7f1:03759] Signal: Segmentation fault (11)
[b44093f3a7f1:03759] Signal code: Invalid permissions (2)
[b44093f3a7f1:03759] Failing at address: 0x7fc283426c00
[b44093f3a7f1:03756] [ 0] /lib/x86_64-linux-gnu/libpthread.so.0(+0x128a0)[0x7f53adb4e8a0]
[b44093f3a7f1:03756] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x18ed8f)[0x7f53ad8d9d8f]
[b44093f3a7f1:03756] [ 2] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_btl_vader.so(+0x2ca8)[0x7f53808f9ca8]
[b44093f3a7f1:03759] [ 0] [b44093f3a7f1:03756] [ 3] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send_request_schedule_once+0x1aa)[0x/lib/x86_64-linux-gnu/libpthread.so.0(+0x128a0)[0x7fc334b408a0]
[b44093f3a7f1:03759] [ 1] 7f537d6ac2fa]
[b44093f3a7f1:03756] [ 4] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback_ack+0x2af)[0x7f537d6a3b6f]
[b44093f3a7f1:03756] [ 5] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_btl_vader.so(mca_btl_vader_poll_handle_frag+0x8f)[0x7f53808fb51f]
[b44093f3a7f1:03756] [ 6] /lib/x86_64-linux-gnu/libc.so.6(+0x18ed8f)[0x7fc3348cbd8f]
[b44093f3a7f1:03759] [ 2] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_btl_vader.so(+0x/usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_btl_vader.so(+0x482e)[0x7f53808fb82e]
[b44093f3a7f1:03756] [ 7] 2ca8)[0x7fc320fb5ca8]
[b44093f3a7f1:03759] [ 3] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_send_request_schedule_once+0x1aa)[0x7fc2d6f9d2fa/usr/lib/x86_64-linux-gnu/libopen-pal.so.20(opal_progress+0x5c)[0x7f53a8b969ec]
[b44093f3a7f1:03756] [ 8] ]
[b44093f3a7f1:03759] [ 4] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_recv_frag_callback_ack+0x2af)[0x7fc2d6f94b6f]
/usr/lib/x86_64-linux-gnu/libmpi.so.20(ompi_request_default_wait_all+0x2e5)[0x7f53a90f03f5]
[b44093f3a7f1:03759] [ 5] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_btl_vader.so(mca_btl_vader_poll_handle_frag+0x8f)[0x7fc320fb751f]
[b44093f3a7f1:03759] [ 6] /usr/lib/x86_64-linux-gnu/openmpi/lib/openmpi/mca_btl_vader.so(+0x482e[b44093f3a7f1:03756] [ 9] )[0x7fc320fb782e]
[b44093f3a7f1:03759] [ 7] /usr/lib/x86_64-linux-gnu/libmpi.so.20(PMPI_Waitall+0x8f)[0x7f53a912762f]
[b44093f3a7f1:03756] /usr/lib/x86_64-linux-gnu/libopen-pal.so.20(opal_progress+0x5c)[0x7fc32fb889ec]
[b44093f3a7f1:03759] [ 8] [10] /root/Desktop/pyfr/pyfr_venv/lib/python3.8/site-packages/mpi4py/MPI.cpython-38-x86_64-linux-gnu.so(+0x75f45)[0x7f53a9424f45]
[b44093f3a7f1:03756] [11] /usr/lib/x86_64-linux-gnu/libmpi.so.20(ompi_request_default_wait_all+0x2e5)[0x7fc3300e23f5]
[b44093f3a7f1:03759] [ 9] /root/Desktop/pyfr/pyfr_venv/bin/python(PyCFunction_Call+0x56)[0x55b44d55bf76]
[b44093f3a7f1:03756] [12] /usr/lib/x86_64-linux-gnu/libmpi.so.20(PMPI_Waitall+0x8f)[0x7fc33011962f]
[b44093f3a7f1:03759] [10] /root/Desktop/pyfr/pyfr_venv/lib/python3.8/site-packages/mpi4py/MPI.cpython-38-x86_64-linux-gnu.so(+0x75f45)[0x7fc330416f45]
[b44093f3a7f1:03759] [11] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyObject_MakeTpCall+0x22f)[0x55b44d51985f]
[b44093f3a7f1:03756] [13] /root/Desktop/pyfr/pyfr_venv/bin/python(PyCFunction_Call+0x56)[0x56330a557f76]
[b44093f3a7f1:03759] [12] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyEval_EvalFrameDefault+0x4596)[0x55b44d5a0f56]
[b44093f3a7f1:03756] [14] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyObject_MakeTpCall+0x22f)[0x56330a51585f]
[b44093f3a7f1:03759] [13] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x55b44d56786b]
[b44093f3a7f1:03756] [15] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyEval_EvalFrameDefault+0x4596)[0x56330a59cf56]
[b44093f3a7f1:03759] [14] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10075e)[0x55b44d4dc75e]
[b44093f3a7f1:03756] [16] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x56330a56386b]
[b44093f3a7f1:03759] [15] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x55b44d56786b]
[b44093f3a7f1:03756] [17] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10075e)[0x56330a4d875e]
[b44093f3a7f1:03759] [16] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10075e)[0x55b44d4dc75e]
[b44093f3a7f1:03756] [18] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x56330a56386b]
[b44093f3a7f1:03759] [17] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x55b44d56786b]
[b44093f3a7f1:03756] [19] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10075e)[0x56330a4d875e]
[b44093f3a7f1:03759] [18] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10077f)[0x55b44d4dc77f]
[b44093f3a7f1:03756] [20] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x56330a56386b]
/root/Desktop/pyfr/pyfr_venv/bin/python(+0x18bc0b)[0x55b44d567c0b]
[b44093f3a7f1:03756] [21] [b44093f3a7f1:03759] [19] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10077f)[0x56330a4d877f]
[b44093f3a7f1:03759] [20] /root/Desktop/pyfr/pyfr_venv/bin/python(+0xfeb84)[0x55b44d4dab84]
[b44093f3a7f1:03756] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x18bc0b)[0x56330a563c0b]
[b44093f3a7f1:03759] [22] [21] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x55b44d56786b]
[b44093f3a7f1:03756] [23] /root/Desktop/pyfr/pyfr_venv/bin/python(+0xfeb84)[0x56330a4d6b84]
[b44093f3a7f1:03759] [22] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10075e)[0x55b44d4dc75e]
[b44093f3a7f1:03756] [24] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x56330a56386b]
[b44093f3a7f1:03759] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyEval_EvalCodeWithName+0x2d2)[0x55b44d566a92]
[b44093f3a7f1:03756] [25] [23] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10075e)[0x56330a4d875e]
[b44093f3a7f1:03759] [24] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x18bd20)[0x55b44d567d20]
[b44093f3a7f1:03756] [26] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyEval_EvalCodeWithName+0x2d2)[0x56330a562a92]
[b44093f3a7f1:03759] [25] /root/Desktop/pyfr/pyfr_venv/bin/python(+0xfeb84)[0x55b44d4dab84]
[b44093f3a7f1:03756] [27] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x18bd20)[0x56330a563d20]
[b44093f3a7f1:03759] [26] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x55b44d56786b]
/root/Desktop/pyfr/pyfr_venv/bin/python(+0xfeb84)[0x56330a4d6b84]
[b44093f3a7f1:03759] [27] [b44093f3a7f1:03756] [28] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x56330a56386b]
[b44093f3a7f1:03759] [28] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10075e)[0x55b44d4dc75e]
[b44093f3a7f1:03756] [29] /root/Desktop/pyfr/pyfr_venv/bin/python(+0x10075e)[0x56330a4d875e]
[b44093f3a7f1:03759] [29] /root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x55b44d56786b]
[b44093f3a7f1:03756] *** End of error message ***
/root/Desktop/pyfr/pyfr_venv/bin/python(_PyFunction_Vectorcall+0x10b)[0x56330a56386b]
[b44093f3a7f1:03759] *** End of error message ***
/root/Desktop/pyfr/pyfr_venv/lib/python3.8/site-packages/pytools/prefork.py:94: UserWarning: Prefork server exiting upon apparent death of parent
warn(f"{who} exiting upon apparent death of {partner}")
/root/Desktop/pyfr/pyfr_venv/lib/python3.8/site-packages/pytools/prefork.py:94: UserWarning: Prefork server exiting upon apparent death of parent
warn(f"{who} exiting upon apparent death of {partner}")
/root/Desktop/pyfr/pyfr_venv/lib/python3.8/site-packages/pytools/prefork.py:94: UserWarning: Prefork server exiting upon apparent death of parent
warn(f"{who} exiting upon apparent death of {partner}")
/root/Desktop/pyfr/pyfr_venv/lib/python3.8/site-packages/pytools/prefork.py:94: UserWarning: Prefork server exiting upon apparent death of parent
warn(f"{who} exiting upon apparent death of {partner}")
--------------------------------------------------------------------------
mpiexec noticed that process rank 1 with PID 0 on node b44093f3a7f1 exited on signal 11 (Segmentation fault).
It seemes something was wrong with the allocation of the GPUs. I set the following lines in the config file to make sure the each Gpu had a rank.
[backend-cuda]
device-id = local-rank
mpi-type = cuda-aware
block-1d = 64
block-2d = 128
Hope someone can offer me any hints. BTW, I’ve linked the directory of libmetis/libcuda etc. to the PATH already.
Regards, Thatcher