NVLINK - Cuda aware MPI - single node performance

Hi,
We have 4 Nvidia SXM2 V100’s on a single node. I noticed that our system-available MPI library (Openmpi 3.1.0) was not compiled with any Cuda support, and I am worried that we may be losing out on some performance. This version of the V100 should support extremely fast GPU-GPU communication (Through NVLINK), but I am unsure if we are effectively using this capability without compiling MPI with cuda support.

Do you know if there can expect any performance gains by compiling Openmpi with cuda support?

Thank you,

Hi Michael,