Datacenter & Infrastructure Issues
GPU & Driver Issues
nvidia-smi returns "No devices were found"
After deploying a VM, the GPU hardware is not detected.
Solutions:
- Reboot the instance — This resolves most transient GPU detection issues.
- If the issue persists after reboot, the GPU may have a hardware problem on the host. Try renting from a different provider or datacenter.
"Failed to initialize NVML: Driver/library version mismatch"
Ubuntu may auto-update GPU drivers, causing a mismatch between the loaded kernel module and the userspace library.
Solution: Reboot the instance to load the matching driver version:
sudo reboot
CUDA initialization errors in PyTorch
If PyTorch shows "CUDA initialization: CUDA unknown error" or cannot detect the GPU:
- Verify the GPU is visible:
nvidia-smi - If
nvidia-smiworks but PyTorch doesn't detect the GPU, check your CUDA toolkit version matches the driver version. - Reboot the instance and try again.
- If the issue persists, try a different instance — the GPU on the current host may have a hardware issue.
High VRAM usage on a fresh container with no processes
In container mode, the GPU is shared between users on the same node. Other users' processes may be consuming VRAM.
Solution: Use VM mode instead of container mode for dedicated GPU access. In VM mode, the GPU is passed through exclusively to your instance.
Networking Issues
Cannot access services between VMs
If a service running on one VM (e.g., port 8080) is not reachable from another VM:
- Check binding address — Make sure the service binds to
0.0.0.0, not127.0.0.1orlocalhost. - Same network — VMs on different subnets may not be able to communicate directly. Contact support to ensure your VMs are deployed on the same network.
Cannot access a web service from your browser
If you start a web service (e.g., ComfyUI, Jupyter) but can't reach it from your local browser:
- Bind to the public IP — Many services default to
localhost. Launch with the--listenor--hostflag:# Example for ComfyUI
python main.py --listen 0.0.0.0 --port 8188 - Access the service at
http://<vm_public_ip>:<port>.
Port availability
All ports are open by default on VMs — there are no firewall restrictions from CloudRift's side. You can configure your own firewall rules using ufw or iptables on the VM.
Docker Container Mode Issues
Wrong container spins up
If selecting "No Container" still launches a pre-configured container:
- Log out and log back into the CloudRift Console.
- Retry the deployment.
Cannot connect to container IP
Container mode is less mature than VM mode. If you cannot connect to a container instance:
- Verify the instance status is "Ready" in the console.
- Try VM mode instead, which provides a full Linux environment with SSH access.
System Service Issues
Service not starting
Check the service logs:
sudo journalctl -u rift
sudo systemctl status rift
Docker permission errors
Ensure the rift service is running as root and Docker is accessible:
sudo systemctl status docker
docker info
If Docker is not running:
sudo systemctl start docker
sudo systemctl enable docker
GPU not detected by the service
- Verify the GPU is visible to the host:
nvidia-smi - Ensure you rebooted after installing or removing drivers.
- Check VFIO binding if running in VM mode:
lspci -k | grep -A 2 -i nvidia