How I installed PyFR 1.6.0 with CUDA backend (and dependancies) on Windows 10

Hello PyFR community,

I recently met Brian Vermere at a conference, who told me in conversation about PyFR and all its sleek features. I’m having some issues running large ANSYS CFX simulations on a SHARCNET computing cluster with memory allocation and poor cpu usage, and I wanted to take PyFR for a test drive as a possible substitute. The key here is I want to test out a bunch of stuff on my local machine before I decide to submit a formal request to have it installed on a HPC cluster.

I very quickly got the impression when browsing the website that PyFR is Linux focused (for good reason too!). So I tried installing PyFR (and dependancies) on Ubuntu (Xenial) running as a guest OS in VirtualBox hosted by Windows 7. Since my work PC is equipped with a GTX 460, it made sense to try the CUDA backend. Everything went smoothly until the part where I had to install CUDA. Since the guest OS doesn’t have direct access to the underlying hardware, a workaround is needed: PCI-passthrough. There’s not a lot of clear information on how this can be applied to the CUDA library, and I didn’t see anyone else who has done this with PyFR so I decided to try Windows instead.

I decided to try it on my home PC (because why not :wink: ), equipped with AMD Ryzen 1600X and Nvidia GeForce GTX 1060 SC running Windows 10.

Installation Steps
Below is a summary of (from my experience) the installation steps necessary to install PyFR on Windows 10:

KEEP IN MIND:
-install/build 64-bit applications and libraries wherever possible.
-test software / module installations as you go.
-After doing this, I discovered Microsoft now supports a package manager called vcpkg, which can be used like apt-get on Ubuntu. I tested it with box2d, Lua, and METIS all of which downloaded and built without issue! Also on the list of packages is MS-MPI and CUDA (all latest versions).

  1. Download and install Visual Studio 2015 / 2017 (I already had 2017 installed, community editions will probably suffice).

  2. Download and install the Visual Studio 2015 Redistributable packages (I don’t think it’s necessary if you installed VS2015). This is necessary because we need to install a 64-bit version of Python (see step []), and Python 3.5 and 3.6 are the first two versions which are distributed in 64-bit flavors on Windows.* This is also necessary for CUDA (and MS-MPI I think).

  3. Download and install Microsoft MPI**. You’ll need both the library and the executables. Here is the link to version 8.1

  4. Add the directory where the executables were installed to your PATH.

  5. Download and install CUDA. Make sure that you let CUDA install the its own graphics drivers using the express install (lest you run into the issue in this thread) (I installed version 8.0.61). No need for Visual Studio Integration.

  6. Navigate to %CUDA_PATH%\bin\ and make a copy of the file cublas64_80.dll, name the copy cublas.dll (PyFR looks for cublas.dll, and we don’t want to dissapoint :slight_smile: )

  7. Download and install a 64-bit version of Python 3.5+ (I got 3.6.1 from here).

  8. Add <PYTHON_ROOT>\Scripts to your PATH.

  9. Install the following modules using pip, letting it find and install dependencies as necessary:

  10. Install numpy from here (I also want to use scipy for other projects, but scipy depends on the Intel Math Kernel Library dependent functions in numpy).

  11. Install mpi4py (allow pip to find online). I didn’t have any problems on my home PC, but I had to edit the configuration file (C:\Program Files\Python36\Lib\site-packages\mpi4py\mpi.cfg) to point to my MS-MPI library and executable directories on my work pc.

  12. Install pycuda from here (I initially tried to install using pip, but there is a strange issue where import pycuda.autoinit causes Python to crash).

  13. Install pyfr (allow pip to find online).1. Test the couette_flow_2d example in pyfr! The example cases aren’t included in the installed pyfr module, so just download the version from the PyFR website, and follow the instructions at the bottom of the User Guide page. Hopefully it works!

  14. Build METIS:

  15. Download and install cmake (I got version 3.9.0-rc5)

  16. Download and unpack METIS (I got version 5.1.0)

  17. Follow the BUILD-Windows instructions cmake-gui option, and tick the SHARED checkbox before hitting generate (PyFR needs the shared library or .dll file, not the static .lib)1. Create a new environment variable called PYFR_METIS_LIBRARY_PATH, and set the value to the fully qualified path of the METIS .dll (e.g. C:/Program Files/METIS/metis.dll). PyFR looks using this environment variable before searching anywhere else.

  18. This step involves editing the PyFR installed source (hopefully it will be obselete soon). For Python >= 3.5 the ctypes module is unable to find the Visual Studio C Runtime Library using the find_msvcrt() function (see this bug report), and it looks like the method for accessing those standard libraries in Windows has changed substantially. I just messed around with the ctypes module until I was able to access the required function (fflush). The result is a tweaked constructor for the Silence object in util.py (see below).

  19. Run the euler_vortex_2d and inc_cylinder_2d examples. Visualize them in Paraview if desired.

`
def init(self, stdout=os.devnull, stderr=os.devnull):
import sys

self.outfiles = stdout, stderr
self.combine = (stdout == stderr)

if sys.platform == ‘win32’:
import ctypes
self.libc_fflush = ctypes.windll.msvcrt.fflush
else:
self.libc_fflush = CDLL(find_libc()).fflush

self.libc_fflush.argtypes = [c_void_p]
`

Hopefully the creators can implement this change a little more elegantly than I can.

Notes:

  • The PyFR website currently says that a 64-bit version of python is required because of a bug in numpy. I’m not sure what that bug is. Even if that bug is fixed, we still need the 64-bit version of python. This is because, as of CUDA version 7.0 the 32-bit version of cublas is no longer supported on Windows (it’s even deprecated on Linux!), and to work with the 64-bit cublas dll in PyFR, the python installation needs to be 64-bit. I guess everything could work with 32-bit if you used CUDA <= 6.X?

** I’ve looked into several versions of MPI for Windows:

  • OpenMPI: Hasn’t supported Windows since version 1.6.5, which means it doesn’t meet PyFR’s requirements. I even downloaded it and tried to see if I could build it myself, but it has a heavy dependence on make+unix commands
  • IntelMPI: Starts from $499… Nope nope nope. I got a free trial which I’m going to test out anyway (on my work PC which has an i7)
  • IBM Platform MPI is the poor man’s IBM Spectrum MPI. Spectrum MPI is cuda-aware, and I’ll find out soon enough if Platform MPI is as well
  • MS-MPI easy to install, free, MPI 2.? standard according to Wikipedia.

Questions
Anything missing from the instructions or other hangups people encountered?

Has anyone tried the PCI-Passthrough for their VM?

Next Steps
Next I’m going to try and fix the weird command-line printout while running (see below).

Also, I’d like to do some informal benchmarking on my system. All I can say right now is that the first two examples take a few minutes and the last one takes ~20 minutes or so? I’ll probably create a separate thread for that.

2 Likes

Hi Nolan,

Thank you for providing your experience and guide for installing PyFR on Windows.

One thing to be aware of, the GTX 1060 has low double precision compute capability (as do nearly all of the GeForce series cards). If you have access to a Tesla GPU you will likely see a very significant decrease in runtime for the example cases.

You could also try switching to single precision in the .ini file, which should run much faster on your system for trying PyFR initially (as the 1060 has good single precision compute). However, this comes at the cost of losing double precision accuracy, so I don’t recommend it for production simulations.

Cheers,

Hi Nolan Dyck - thanks for the helpful setup instructions!
Re: “weird command-line printout…”, I run Visual Studio 2015 on Win7, and saw that sort of output when stepping under Visual Studio debugger.
For normal runs, I get a neat console-mode “progress bar” by adding the following block near the end of pyfr/progress_bar.py:

Write the progress bar and pad the remaining columns

if sys.platform == ‘win32’:

NN: for windows console

sys.stdout.write(’\b’ * 80)
sys.stdout.write(’\r’)
sys.stdout.write(s)
sys.stdout.flush()
else:
sys.stderr.write(’\x1b[2K\x1b[G’)
sys.stderr.write(s)
sys.stderr.flush()

Update the last render time

self._last_wallt = wallt

Nigel

Brian,

Thanks for the info! I changed the precision to ‘single’ in the .ini file, and the Couette flow took slightly longer than the double precision case (5:39 vs 5:43)! I don’t really know what’s going on right now so I’m going to start by testing my CUDA installation, pycuda, and pyfr, respectively to see what the issue is. Do you have any ideas off the top of your head?

Considering that you could trade 4 Nvidia Tesla P100 cards for a Tesla Model 3 at current market values, I don’t think there will be any just lying around at school. However, if I can get some of my cases working, the new graham cluster is equipped with a bunch of heterogeneous computing nodes with Tesla P100s.

Nolan

Nigel,

Thanks for the code snippit. I pasted it in, and it didn’t work right away. There’s an extra space printed at the beginning of each line for some reason, and I couldn’t figure out where it was being printed. I ended up just shortening the expected column width by 1 by adding the next couple lines to the initialization function.

`

self._ncol = shutil.get_terminal_size()[0] or 80

if sys.platform == ‘win32’:
self._ncol -= 1

`

Also, I didn’t need the sys.stdout.write(’\b’ * 80) line.

Nolan

Hi Nolan,

Hi all,

Thanks for the code snippit. I pasted it in, and it didn't work right
away. There's an extra space printed at the beginning of each line for
some reason, and I couldn't figure out where it was being printed. I
ended up just shortening the expected column width by 1 by adding the
next couple lines to the initialization function.

The progress bar requires a console which understands VT100 escape
sequences, for these are the only way to get the bar to redraw reliably
(everything else is somewhat fragile and liable to break). Windows has
traditionally not supported such sequences, although it does appear as
if the most recent builds of Windows 10 do. However, in order for them
to function one needs to call into the Win32 API to enable them. See:

<Microsoft Learn: Build skills that open doors in your career;

I'd be happy to take a patch that takes care of this (but it is
important that this does not cause issues for those who are not yet on
Windows 10 or running PyFR via other means).

Regards, Freddie.