Setup for VM
If your node will serve virtual machines, you need to install and configure Libvirt and Qemu.
To allow Libvirt to use GPU device in pass-through mode, you have to make sure the NVIDIA driver is not installed or not used by the system.
0. Check NVIDIA Driver Installation
You can check if the NVIDIA driver is loaded and used by the system using the following command:
lsmod | grep nvidia
# or check if driver is used by your GPU:
lspci -k | grep -EA3 'VGA|3D|Display'
1. Remove NVIDIA Driver (if installed)
If the NVIDIA driver is installed, you can remove it using the following command:
sudo apt-get remove --purge '^nvidia-.*'
sudo apt autoremove
echo 'nouveau' | sudo tee -a /etc/modules
sudo rm /etc/X11/xorg.conf
If you've installed the driver using the RUN file, remove it using:
sudo /usr/bin/nvidia-uninstall
Remove configs if any.
sudo rm -rf /etc/X11/xorg.conf
sudo rm -rf /etc/modprobe.d/nvidia*.conf
sudo rm -rf /lib/modprobe.d/nvidia*.conf
After the reboot, verify that driver is uninstalled. The following command should return nothing:
lsmod | grep nvidia
Restart your machine to apply the changes.
2. Install System Dependencies
Install additional packages required by the service.
sudo apt update
sudo apt install qemu-kvm libvirt-daemon-system genisoimage whois
3. Configure Libvirt
Rift system service is run as the root user, so you need to set up libvirt to run under root as well.
Edit /etc/libvirt/qemu.conf
and modify the user
and group
settings:
user = "root"
group = "root"
Then restart the libvirtd
service:
sudo systemctl restart libvirtd
You can use client_setup.sh to install libvirt and rift:
sudo client_setup.sh --only=vm
sudo client_setup.sh --only=rift
4. Check IOMMU groups
Run the following script on the host system.
#!/bin/bash
for g in /sys/kernel/iommu_groups/*; do
group_num=$(basename "$g")
gpu_found=false
device_lines=""
for d in "$g"/devices/*; do
pci_addr=$(basename "$d")
line=$(lspci -nn -s "$pci_addr")
device_lines+="$line"$'\n'
if echo "$line" | grep -qE 'VGA compatible controller|3D controller'; then
gpu_found=true
fi
done
if $gpu_found; then
echo "IOMMU Group $group_num:"
echo "$device_lines"
fi
done
Typically, you'll see just GPUs or GPU + Audio device or in some cases GPU + Audio + PCI bridges. All of these are normal.
IOMMU Group 112:
a0:03.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
a0:03.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
a1:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:2b85] (rev a1)
a1:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:22e8] (rev a1)
IOMMU Group 13:
40:01.0 Host bridge [0600]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse PCIe Dummy Host Bridge [1022:1482]
40:01.1 PCI bridge [0604]: Advanced Micro Devices, Inc. [AMD] Starship/Matisse GPP Bridge [1022:1483]
41:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:2b85] (rev a1)
41:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:22e8] (rev a1)
If you see multiple GPUs in the same group or too many devices, your motherboard might not support all the necessary features for virtualization and it may be tricky to workaround these.
5. (Optional) Enable Other Virtualization Options
The iommu=pt and pci=realloc kernel parameters in GRUB are used to optimize I/O operations and improve performance, particularly in virtualized environments or when dealing with GPUs with large amount of memory. The iommu=pt option enables IOMMU pass-through mode, bypassing DMA translation for better performance, while pci=realloc allows the kernel to reallocate PCI resources if the BIOS's initial allocation is insufficient.
Add iommu=pt pci=realloc pcie_aspm=off
and intel_iommu=on
or amd_iommu=on
options, depending on your system,
to the following line in the /etc/default/grub
.
GRUB_CMDLINE_LINUX_DEFAULT="... iommu=pt pci=realloc pcie_aspm=off amd_iommu=on (or intel_iommu=on)"
Here is a short summary of these options:
iommu=pt
: IOMMU passthrough mode tells the kernel to enable the IOMMU but use pass-through mode for DMA mappings by default. For VFIO GPU passthrough — allows the device to access physical memory directly with minimal performance penalty.pci=realloc
: Reallocate PCI resources forces the kernel to reassign PCI bus resources (MMIO/IOBARs) from scratch, ignoring what the firmware/BIOS assigned. It helps avoid issues when the BIOS didn't allocate enough space for devices (common with large GPUs or multiple devices). Fixes “BAR can't be assigned” or “resource busy” errors.pcie_aspm=off
: Disable PCIe Active State Power Management, which is a power-saving feature that reduces PCIe link power in idle states. Some PCIe devices (especially GPUs) have trouble retraining links or waking from ASPM low-power states, leading to hangs or device inaccessible errors. Disabling ASPM can improve stability.
Then, invoke (or do it after the next section).
sudo update-initramfs -u -k all
sudo update-grub
sudo reboot
6. Bind to VFIO Early
For greater stability it is best to ensure that VFIO driver is always bound and no driver changes are happening at runtime. Thus, first blacklist all other drivers early and disable video mode setup to ensure that the GPU is not used by the system.
You need to first figure out the PCI vendor ID and device ID for GPUs.
To do so you can run lspci -nnk | grep NVIDIA
. You'll see output like this (RTX 5090):
$ lspci -nnk | grep NVIDIA
01:00.0 VGA compatible controller [0300]: NVIDIA Corporation Device [10de:2b85] (rev a1)
01:00.1 Audio device [0403]: NVIDIA Corporation Device [10de:22e8] (rev a1)
Subsystem: NVIDIA Corporation Device [10de:0000]
Then add the following lines to GRUB_CMDLINE_LINUX_DEFAULT
in the /etc/default/grub
changing the
PCI vendor ID and device ID appropriately. Keep other options if needed.
GRUB_CMDLINE_LINUX_DEFAULT="nomodeset video=efifb:off modprobe.blacklist=nouveau,nvidia,nvidiafb,snd_hda_intel vfio-pci.ids=10de:2b85,10de:22e8 ..."
The reason why we're adding the driver binding options to GRUB
is because vfio
is a kernel builtin module in many modern OS.
Many online manuals suggest adding vfio modules to /etc/modprobe.d/vfio.conf
, but any options we add to /etc/modprobe.d/XXX
are ignored, because modprobe doesn't control builtin modules.
We also need to ensure that /etc/initramfs-tools/modules
contains these lines:
vfio
vfio_iommu_type1
vfio_pci
vfio_virqfd
Also, it is a good idea to disable power management for VFIO devices to improve stability.
echo "options vfio-pci disable_idle_d3=1" | sudo tee /etc/modprobe.d/vfio.conf
Finally, invoke:
sudo update-initramfs -u -k all
sudo update-grub
sudo reboot
After reboot check that VFIO drivers are in use. You can use lspci -k | grep -A 2 -i nvidia
command
and should see vfio
drivers in use:
81:00.0 VGA compatible controller: NVIDIA Corporation Device 2b85 (rev a1)
Subsystem: Gigabyte Technology Co., Ltd Device 416f
Kernel driver in use: vfio-pci
Kernel modules: nvidiafb, nouveau
81:00.1 Audio device: NVIDIA Corporation Device 22e8 (rev a1)
Subsystem: NVIDIA Corporation Device 0000
Kernel driver in use: vfio-pci
Kernel modules: snd_hda_intel
Next Steps
Proceed to the Disk Setup guide.