8.1 KiB
+++ author = "FlintyLemming" title = "Proxmox VE 8.1 vGPU 配置 (A6000)" slug = "d29bb28b14984443b232263348b946ba" date = "2023-12-13" description = "新到的大玩具" categories = ["Consumer", "Linux"] tags = ["pve", "Nvidia"] image = "https://img.mitsea.com/blog/posts/2023/12/Proxmox%20VE%208.1%20vGPU%20%E9%85%8D%E7%BD%AE%20%EF%BC%88A6000%EF%BC%89/jigar-panchal-TVyPnkS5k5w-unsplash.jpg?x-oss-process=style/ImageCompress" +++
操作环境
Dell R750xa 配置如下
设备配置
确保开启虚拟化和 SR-IOV
Proxmox VM host 环境配置
配置软件源
-
删除企业源和 Ceph 源
rm /etc/apt/sources.list.d/pve-enterprise.list rm /etc/apt/sources.list.d/ceph.list
-
修改软件源为国内源
nano /etc/apt/sources.list # 内容修改为如下内容 deb https://mirrors.aliyun.com/debian/ bookworm main contrib non-free deb-src https://mirrors.aliyun.com/debian/ bookworm main contrib non-free deb https://mirrors.aliyun.com/debian/ bookworm-updates main contrib non-free deb-src https://mirrors.aliyun.com/debian/ bookworm-updates main contrib non-free deb https://mirrors.aliyun.com/debian/ bookworm-backports main contrib non-free deb-src https://mirrors.aliyun.com/debian/ bookworm-backports main contrib non-free deb https://mirrors.ustc.edu.cn/debian-security/ stable-security main contrib non-free deb-src https://mirrors.ustc.edu.cn/debian-security/ stable-security main contrib non-free
其他系统配置
-
开启 iommu
nano /etc/default/grub # 找到 GRUB_CMDLINE_LINUX_DEFAULT="quiet" # 改为: GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt" # 更新 grub update-grub
-
加载 vfio 模块
echo vfio >> /etc/modules echo vfio_iommu_type1 >> /etc/modules echo vfio_pci >> /etc/modules echo vfio_virqfd >> /etc/modules
-
屏蔽现有开源驱动,然后重启
echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf echo "blacklist nvidia" >> /etc/modprobe.d/blacklist.conf echo "blacklist nvidiafb" >> /etc/modprobe.d/blacklist.conf # 更新内核参数 update-initramfs -k all -u
修改显卡模式
-
如果 GPU 带显示接口,需要修改显卡模式。使用下面的命令检查,如果结果中显示为 VGA compatible controller 就需要修改。
lspci | grep NVIDIA # 执行结果 17:00.0 VGA compatible controller: NVIDIA Corporation GA102GL [RTX A6000] (rev a1) 17:00.1 Audio device: NVIDIA Corporation GA102 High Definition Audio Controller (rev a1) 65:00.0 VGA compatible controller: NVIDIA Corporation GA102GL [RTX A6000] (rev a1) 65:00.1 Audio device: NVIDIA Corporation GA102 High Definition Audio Controller (rev a1) ca:00.0 VGA compatible controller: NVIDIA Corporation GA102GL [RTX A6000] (rev a1) ca:00.1 Audio device: NVIDIA Corporation GA102 High Definition Audio Controller (rev a1) e3:00.0 VGA compatible controller: NVIDIA Corporation GA102GL [RTX A6000] (rev a1) e3:00.1 Audio device: NVIDIA Corporation GA102 High Definition Audio Controller (rev a1)
-
下载 NVIDIA Display Mode Selector Utility,可以从这里下但是不保证链接有效性
-
检查当前显卡,获得序号
chmod +x displaymodeselector ./displaymodeselector --list
-
修改显卡模式
./displaymodeselector --gpumode physical_display_disabled -i 0 ./displaymodeselector --gpumode physical_display_disabled -i 1 ./displaymodeselector --gpumode physical_display_disabled -i 2 ./displaymodeselector --gpumode physical_display_disabled -i 3
-
重启服务器,重启后应该显示为 3D Controller
17:00.0 3D controller: NVIDIA Corporation GA102GL [RTX A6000] (rev a1) 65:00.0 3D controller: NVIDIA Corporation GA102GL [RTX A6000] (rev a1) ca:00.0 3D controller: NVIDIA Corporation GA102GL [RTX A6000] (rev a1) e3:00.0 3D controller: NVIDIA Corporation GA102GL [RTX A6000] (rev a1)
安装驱动
-
安装 NVIDIA Driver 安装时需要的依赖
apt update apt install build-essential dkms mdevctl pve-headers-$(uname -r)
-
安装驱动,下载的驱动包有好几个驱动,安装 host 驱动。驱动可以从这里下,但是不保证链接有效性。把驱动传到服务器上后,设置执行权限后运行。
chmod +x NVIDIA-Linux-x86_64-535.104.06-vgpu-kvm.run ./NVIDIA-Linux-x86_64-535.104.06-vgpu-kvm.run --dkms
-
执行
nvidia-smi
后无误即可
搭建 vGPU 授权服务器
Oscar Krause / FastAPI-DLS · GitLab
按照仓库 Readme 搭建就行了,主要就是强制 https,本地的话需要生成一个自签名证书。法外狂徒挂公网可以无视,nginx 证书配好就行。对于挂在公网上有几个注意点:
- docker 命令中的
DLS_URL=
hostname -i填你反代时要使用的域名例如`DLS_URL=`xxx.xxx.com
DLS_PORT=443
不要动,只改 port 映射出去的端口,比如改成-p 4433:443
这样反代那边就反代容器 IP:4433
虚拟机添加设备
开机后需要启用 SR-IOV 设备,每次开机都要执行,可以写成一个服务开机自动执行一次
/usr/lib/nvidia/sriov-manage -e ALL
Raw Device 选择一个不是 .0 的设备后,MDev Type 就可以选 vGPU Profile 了。如果想要用整张显卡,也不要通 .0 的设备,据说会容易导致 pve 爆炸失联,建议还是选择一个用完所有显存的 Profile。
激活 vGPU 授权
参考激活服务器 Readme 中 Setup Client 一节
Oscar Krause / FastAPI-DLS · GitLab
Windows
-
进入 Windows 后先安装之前那个驱动包里的 host 驱动
-
从 https://<你的dls服务器>/-/client-token 上下载配置文件,然后放到 C:\Program Files\NVIDIA Corporation\vGPU Licensing\ClientConfigToken 下
-
重启电脑,然后就能看到正在获取许可证并激活成功
Linux
执行下面的命令
curl --insecure -L -X GET https://<dls-hostname-or-ip>/-/client-token -o /etc/nvidia/ClientConfigToken/client_configuration_token_$(date '+%d-%m-%Y-%H-%M-%S').tok
service nvidia-gridd restart
Photo by Jigar Panchal on Unsplash