This is an old revision of the document!

Alternating a dedicated GPU between DRM/offloading and passthrough (IOMMU)

PCI-E passthrough¹⁾²⁾³⁾ is an interesting software contraption. However, for all good it does, it is still a cumbersome solution. Thus, I want to be able to use my GPU both at the host and at the guest systems. Not at the same time, obviously, though it should be possible for some models, using completely different technologies.

This article is an extension of the following thread at r/VFIO: Dynamically binding/unbinding an Nvidia card from a VM without restarting Xorg?.

The requirements

A discrete CPU running on a different driver (or an internal GPU);
- A CPU (and, perhaps, a motherboard) which supports IOMMU.
Note on UEFI: albeit a good practice, it's not that important for the host, but it's very important that the guest boots from (U)EFI ⁴⁾.

My environment

My computer has the following specs.:

Asus P8H61-M LX2 R2.0
Intel Xeon E3-1245 v2
12 GB DDR3 @ 1333 MT/s
MSI GeForce GTX 760
Intel HD Graphics P4000
Debian Bullseye

No special BIOS configs. were needed, besides enabling virtualization support and specifying the iGPU as the primary video card. This particular ROM doesn't mentions anything about IOMMU (VT-d), but it works anyway.

For my GRUB CMDLINE, I have the following:

GRUB_CMDLINE_LINUX_DEFAULT="pcie_aspm=off iommu=pt intel_iommu=on"

I do not know it 'pcie_aspm=off' is still a required flag, but it seemed to solve a problem I had with my GPU “falling out of the bus”. You could probably do away with it…

Getting the system off the dGPU train

As per normal, Linux modesets the video card and the X server binds to it. X, old ass technology it is, can't let go of cards unless you stop it. So, we need it to bind to the iGPU, only, which is done by means of the following lines of the '/etc/X11/xorg.conf' file ⁵⁾⁶⁾⁷⁾:

# Use iGPU for X Server
Section "Device"
    Identifier      "intel"
    Driver          "intel"
    BusId           "PCI:0:2:0"
    Option      "DRI"    "3"
EndSection

The “DRI” setting seems to be needed for Vulkan to work⁸⁾ properly.

Then, after making sure your /etc/default/grub contains the magic words (i.e. iommu=pt [amd|intel]_iommu=on) and updating GRUB (update-grub[2] on Debian-based distros), you just restart your machine.

Also, it is, perhaps, a good idea to add the following lines to /etc/modules:

vfio-pci
pci-stub

Getting offload to work

When not using VM's, you would ideally be able to use your card in a fairly normal fashion⁹⁾. But for it to be possible, you need to get your 3D stuff to render at your expensive card and to draw at your integrated GPU. That's whats called “GPU offloading”. And to be able to pull it off, the key software is Bumblebee¹⁰⁾. It's neither the newest or the most efficient offloading agent out there, but, for several reasons¹¹⁾, it is, currently, the only one which works for us.

It's quite trivial to install (Debian):

apt install bumblebee-nvidia

You can also install primus-vk-nvidia, which provides primusrun command. A slightly different animal tan optirun, but, in fact, doing pretty much the same stuff.

Then, you just run your 3D application like this:

$ optirun glxgears

Passing through the GPU

First step

Before trying to pass the GPU to the virtual machine, the kernel driver needs to be unloaded or the card needs to be “unbinded”. I've put together a crude script which does more or less that. There it is (stop-nv):

#!/bin/bash

/etc/init.d/bumblebeed stop
/etc/init.d/nvidia-persistenced stop

echo Taking five... 
sleep 5

rmmod nvidia_drm; sleep 1
if [ $? != 0 ]; then echo "DRM's busy. Check it!"; exit 1; fi
rmmod nvidia_modeset; sleep 1
if [ $? != 0 ]; then echo "Modeset's busy. Check it!"; exit 1; fi
rmmod nvidia; sleep 1
if [ $? != 0 ]; then echo "Nvidia driver's busy. Check it!"; exit 1; fi 

# CHANGE THOSE VALUES ACCORDINGLY
# ID: the two groups four hex. digits identifying the particular model, i.e. 10de 0aff
# BDF: the bus in which it sits
# To find those guys, just run “lspci -nn”
ID="IDID IDID"
BDF="0000:0x:00.0"
# Unbinds it from the current driver
[ ! -e /sys/bus/pci/devices/$BDF/driver/unbind ] || \
        echo -n $BDF > /sys/bus/pci/devices/$BDF/driver/
# Binds it to the VFIO driver
echo -n $ID > /sys/bus/pci/drivers/vfio-pci/new_id
echo -n $BDF > /sys/bus/pci/drivers/vfio-pci/bind
# Audio integrado
ID="IDID IDID"
BDF="0000:0x:00.1"
[ ! -e /sys/bus/pci/devices/$BDF/driver/unbind ] || \
        echo -n $BDF > /sys/bus/pci/devices/$BDF/driver/unbind
echo -n $ID > /sys/bus/pci/drivers/vfio-pci/new_id
echo -n $BDF > /sys/bus/pci/drivers/vfio-pci/bind

It is not perfect, and, more often than not, some of those commands end up being reentered manually.

Second step

The dedicated card should now be available. In order to avoid potential headaches, make sure your IOMMU groups doesn't contain things like storage controllers or whatever device you might be using. There's a small shell script at ArchWiki that lists valid IOMMU groups:

#!/bin/bash
shopt -s nullglob
for g in `find /sys/kernel/iommu_groups/* -maxdepth 0 -type d | sort -V`; do
    echo "IOMMU Group ${g##*/}:"
    for d in $g/devices/*; do
        echo -e "\t$(lspci -nns ${d##*/})"
    done;
done;

Example:

[...]
IOMMU Group 1:
	00:01.0 PCI bridge [0604]: Intel Corporation Xeon E3-1200 v2/3rd Gen Core processor PCI Express Root Port [8086:0151] (rev 09)
	01:00.0 VGA compatible controller [0300]: NVIDIA Corporation GK104 [GeForce GTX 760] [10de:1187] (rev a1)
	01:00.1 Audio device [0403]: NVIDIA Corporation GK104 HDMI Audio Controller [10de:0e0a] (rev a1)
[...]

In that example, there's only the root port and the card (plus it's HDMI audio device). Passing it through should not break anything.

Third step

At that point, you can just follow the first tutorial noted at that article.

Unbinding and returning to the “normal” life

After making sure your VM is unloaded, you can run the following script:

ID="10de IDID"
BDF="0000:0x:00.0"
# Unbind a PCI function from its driver as necessary
[ ! -e /sys/bus/pci/devices/$BDF/driver/unbind ] || \
        echo -n $BDF > /sys/bus/pci/devices/$BDF/driver/unbind

modprobe nvidia
modprobe nvidia_modeset
modprobe nvidia_drm

/etc/init.d/nvidia-persistenced start
/etc/init.d/bumblebeed start

It is important to note that it is also a crude implementation and, as such doesn't deal all that much with the integrated audio device (as it's not of much importance to me). Perhaps you should edit your scripts to include it.