Nvidia Egpu For Mac

[Updated on 2018.11.14] I finally made my GTX1070 working with my MBP for Pytorch and fast.ai. Below are the steps:

Environment

The unlock NVIDIA parameter tells the script to make the Mac compatible with NVIDIA eGPUs. This is only required for macOS 10.13.4 and macOS 10.13.5. This might cause issues/crashes with AMD graphics cards (external).
Razer wilfully mis-advertises eGPU support for Mac. Don't buy the Chroma for OS X. I recently bought a Razer Core X Chroma eGPU enclosure to use with a new Macbook Pro, based on Razer's advertising of compatibility with OS X.

MacBook Pro (15-inch, 2016) with touch bar
OSX version: 10.13.6 (Mojave may not work yet as of now)
eGPU: Razer Core X + GTX 1070 (MSI)

Apple's macOS Mojave is a great software update for most users. But, it isn't for those who are using an Nvidia graphics card in their Mac Pro or inside of an external GPU enclosure, so let's talk.

Steps 1: Install Nvidia Web Driver

10.13.6 + 17G65

Follow this if your system is 10.13.6 17G65 (* you can check this number by clicking 'version 10.13.6' in 'About this Mac'* ). If it is 17G3025 or later, jump to the next section '10.13.5 + 17G3025'

Use this great tool macOS-eGPU to install Nvidia web driver. Just follow the guide and install by '> macos-egpu'.

Although it also provide the options to let you install CUDA, DO NOT use it. Because it will automatically install the latest version, which seems not working for Pytorch yet. So just install the NVIDIA web driver.

After the installation, my web driver version is: 387.10.10.10.40.105. Make sure you have this version if your OSX version is 10.13.6 + 17G65.

10.13.6 + 17G3025 [Added on 2018.11.14]

There comes a new security patch in High Sierra 10.13.6 17G3025 (* you can check this number by clicking 'version 10.13.6' in 'About this Mac'* ) in the beginning of November. The macOs-eGPU has not been updated for this new OSX build yet (as of today 2018.11.14). So I would suggest to use another tool instead: purge-wrangle. It is the same or better (personal opinion), just follow the guide and select 'Enable NVIDIA eGPUs'.

After the installation, the web driver version will be: 387.10.10.10.40.108.

10.13.6 + 17G5019 [Added on 2019.02.27]

Same as before, just use purge-wrangle to apply the patch. If you already patched the system with purge-wrangle before, simply upgrade the Nvidea web driver. After the restart, purge-wrangle will prompt you to re-patch the system. Easy as a pie.

After the installation, the web driver version will be: 387.10.10.10.40.118.

Verify

If it is installed successfully, once you plug in your eGPU, you shall see your GTX 1070 in 'About This Mac -> System Report... -> Graphics/Displays' and 'Activity Monitor -> Window -> GPU History'. Or you can simply plug an external monitor to eGPU to see if it works.

NOTE: It doesn't support eGPU hot unplug yet. So it is suggested to 'reboot and unplug the moment the eGPU power shuts down'. (If it is not done properly, kernel panic will happen). But my Razer Core X will not shut down the power during the restart. The fan of the Razer Core X keeps spinning and probably because the GPU temperature is low the fan on GTX 1070 doesn't spin at all. So for me, there is no way to tell the right moment from the eGPU. But with some experiment, I found it seems safe to unplug at the moment that the keyboard backlight turns off during the restart.

Step 2: Install CUDA driver, toolkit

Pytorch works with CUDA 9.2. It doesn't support the latest CUDA 10.0 yet. So I downloaded the installation image from Nvidia. It includes CUDA driver, toolkit and samples. Just install all of them. We will need samples later on. CUDA Toolkit 9.2 has a patch, install the patch as well. You can download the patch from the same place as listed above.

Follow the installation guide here

Make sure the deviceQuery and bandwidthTest from samples work after installation.

After the installation, my CUDA driver version is: ** 396.148 **. You can get this information with the command '> macos-egpu -C'.

Step 3: Install CUDNN

Get into this page to download the installation image (require registration). 'https://developer.nvidia.com/rdp/cudnn-archive' -> click 'cuDNN v7.1.4 Library for OSX'.

Make sure to use cuDNN v7.1.4.

Follow this guide for the installation.

Step 4: Compile and install Pytorch

I followed this guide. It mostly correct as for me, but not all... So I would like to write down the steps that works for me.

Create conda enviroment

With ptc active (> source activate ptc)

After the above step, unset CMAKE_PREFIX_PATH or simply open a new terminal and activate ptc. This is very important. Becuase we are going to compile pytorch, with CMAKE_PREFIX_PATH, it will cause problem (and it did cause problem for me).
Get the PyTorch source

Switch to v0.4.1 and initial submodules

Before we go ahead to start compiling, make sure we have everything correctly:
- The following are my enviornment variables in ~/.bash_profile. CUDA_HOME and CUDA_NVCC_EXECUTABLE may not be needed. It was there because I tried to compile tensorflow previously. The last PATH (PATH=/usr/local/cuda/bin:$PATH) may be removed as well. But to be safe, you can keep the same as mine.
- clang version, it shall be something like below after step 2.
Build and install Pytorch. It will take a while (like 30 minutes to an hour). Just be patient.

After it is done, verify it with '> pip list'. You should see 'torch' with version '0.5.0a0+a24163a'. And with eGPU connected, you can also do. Make sure is_available() returns True.

Step 5: Install fastai

This will install torchvision-nightly for you, which is needed by fastai.

Double check pip list and make sure Pillow is installed correctly and there is no both 'Pillow' and 'pillow'.

With torchvision-nightly, installed, we can verify that pytorch is installed correctly. Download pytorch examples and compare time required with and without cuda.

There are couple of packages shall be installed for fastai as well.

Maybe there are more, and maybe the best way is to install it from fastai repo. But these works for me.

Let's test.

In Jupyter notebook, open courses/dl1/lesson1.ipynb. Run the first few code blocks, especially those imports and see if there is any error. If any error about missing packages, just pip install them.

Then you can go ahead with the lesson, test and enjoy your eGPU!

BTW, You can open 'GPU History' from 'Activity Monitor' and monitor your eGPU's load while testing.

If this gist helped you, please leave a star ;-) I will be very happy to see that it helped.

Final Note

Be ** VERY VERY CAREFULE ** about installing OSX security patches / updates.

I installed the latest update for High Sierra, which updated the macOS build to 17G3025 (still version 10.13.6). Then macos-egpu doesn't support this new build and it won't recognize the egpu anymore. Luckily I found purge-wrangler which saved my life. And I like the way it is designed and explained.

Every macOS update rewrites kernel extensions (including security updates). This means that all patches installed using purge-wrangler.sh are reset. With V5.0.0 or later, the system will notify you if this has happened, and allow you to re-patch immediately.

I recommend to have a time machine backup before every system updates. Because it seems there is always a gap between the OSX system updates is released and the corresponding nvidia web driver is released. If you apply the system update before the new web driver is available, you will end up with nowhere... If you still want to use the system, you have to either rollback your system with time machine or use Web-Driver-Toolkit as suggested here to patch the NVDAStartup (I just read this post but didn't try it by myself.)