From e1673e6454a83fabd627221b966c5ed40f573e06 Mon Sep 17 00:00:00 2001 From: FlintyLemming Date: Mon, 6 Nov 2023 19:05:19 +0800 Subject: [PATCH] update post --- .../index.zh-cn.md | 195 ++++++++++++++++++ 1 file changed, 195 insertions(+) create mode 100644 content/post/d29bb28b14984443b232263348b946ba/index.zh-cn.md diff --git a/content/post/d29bb28b14984443b232263348b946ba/index.zh-cn.md b/content/post/d29bb28b14984443b232263348b946ba/index.zh-cn.md new file mode 100644 index 0000000..d08b80c --- /dev/null +++ b/content/post/d29bb28b14984443b232263348b946ba/index.zh-cn.md @@ -0,0 +1,195 @@ ++++ +author = "FlintyLemming" +title = "Proxmox VE 8.1 vGPU 配置 (A6000)" +slug = "d29bb28b14984443b232263348b946ba" +date = "2023-12-13" +description = "" +categories = ["Consumer", "Linux"] +tags = ["pve", "Nvidia"] +image = "https://img.mitsea.com/blog/posts/2023/12/Proxmox%20VE%208.1%20vGPU%20%E9%85%8D%E7%BD%AE%20%EF%BC%88A6000%EF%BC%89/jigar-panchal-TVyPnkS5k5w-unsplash.jpg?x-oss-process=style/ImageCompress" ++++ + +## 操作环境 + +Dell R750xa 配置如下 + +![](https://img.mitsea.com/blog/posts/2023/12/Proxmox%20VE%208.1%20vGPU%20%E9%85%8D%E7%BD%AE%20%EF%BC%88A6000%EF%BC%89/Untitled.png?x-oss-process=style/ImageCompress) + +## 设备配置 + +确保开启虚拟化和 SR-IOV + +![](https://img.mitsea.com/blog/posts/2023/12/Proxmox%20VE%208.1%20vGPU%20%E9%85%8D%E7%BD%AE%20%EF%BC%88A6000%EF%BC%89/Untitled%201.png?x-oss-process=style/ImageCompress) + +## Proxmox VM host 环境配置 + +### 配置软件源 + +1. 删除企业源和 Ceph 源 + + ```bash + rm /etc/apt/sources.list.d/pve-enterprise.list + rm /etc/apt/sources.list.d/ceph.list + ``` + +2. 修改软件源为国内源 + + ```bash + nano /etc/apt/sources.list + # 内容修改为如下内容 + deb https://mirrors.aliyun.com/debian/ bookworm main contrib non-free + deb-src https://mirrors.aliyun.com/debian/ bookworm main contrib non-free + deb https://mirrors.aliyun.com/debian/ bookworm-updates main contrib non-free + deb-src https://mirrors.aliyun.com/debian/ bookworm-updates main contrib non-free + deb https://mirrors.aliyun.com/debian/ bookworm-backports main contrib non-free + deb-src https://mirrors.aliyun.com/debian/ bookworm-backports main contrib non-free + deb https://mirrors.ustc.edu.cn/debian-security/ stable-security main contrib non-free + deb-src https://mirrors.ustc.edu.cn/debian-security/ stable-security main contrib non-free + ``` + +### 其他系统配置 + +1. 开启 iommu + + ```bash + nano /etc/default/grub + # 找到 + GRUB_CMDLINE_LINUX_DEFAULT="quiet" + # 改为: + GRUB_CMDLINE_LINUX_DEFAULT="quiet intel_iommu=on iommu=pt" + # 更新 grub + update-grub + ``` + +2. 加载 vfio 模块 + + ```bash + echo vfio >> /etc/modules + echo vfio_iommu_type1 >> /etc/modules + echo vfio_pci >> /etc/modules + echo vfio_virqfd >> /etc/modules + ``` + +3. 屏蔽现有开源驱动,然后重启 + + ```bash + echo "blacklist nouveau" >> /etc/modprobe.d/blacklist.conf + echo "blacklist nvidia" >> /etc/modprobe.d/blacklist.conf + echo "blacklist nvidiafb" >> /etc/modprobe.d/blacklist.conf + # 更新内核参数 + update-initramfs -k all -u + ``` + +### 修改显卡模式 + +1. 如果 GPU 带显示接口,需要修改显卡模式。使用下面的命令检查,如果结果中显示为 VGA compatible controller 就需要修改。 + + ```bash + lspci | grep NVIDIA + # 执行结果 + 17:00.0 VGA compatible controller: NVIDIA Corporation GA102GL [RTX A6000] (rev a1) + 17:00.1 Audio device: NVIDIA Corporation GA102 High Definition Audio Controller (rev a1) + 65:00.0 VGA compatible controller: NVIDIA Corporation GA102GL [RTX A6000] (rev a1) + 65:00.1 Audio device: NVIDIA Corporation GA102 High Definition Audio Controller (rev a1) + ca:00.0 VGA compatible controller: NVIDIA Corporation GA102GL [RTX A6000] (rev a1) + ca:00.1 Audio device: NVIDIA Corporation GA102 High Definition Audio Controller (rev a1) + e3:00.0 VGA compatible controller: NVIDIA Corporation GA102GL [RTX A6000] (rev a1) + e3:00.1 Audio device: NVIDIA Corporation GA102 High Definition Audio Controller (rev a1) + ``` + +2. 下载 NVIDIA Display Mode Selector Utility,可以从[这里](https://index.mitsea.com/%E8%BD%AF%E4%BB%B6/%E5%BA%94%E7%94%A8%E7%A8%8B%E5%BA%8F/Display_Mode-1.61.0.zip)下但是不保证链接有效性 +3. 检查当前显卡,获得序号 + + ```bash + chmod +x displaymodeselector + ./displaymodeselector --list + ``` + +4. 修改显卡模式 + + ```bash + ./displaymodeselector --gpumode physical_display_disabled -i 0 + ./displaymodeselector --gpumode physical_display_disabled -i 1 + ./displaymodeselector --gpumode physical_display_disabled -i 2 + ./displaymodeselector --gpumode physical_display_disabled -i 3 + ``` + + ![](https://img.mitsea.com/blog/posts/2023/12/Proxmox%20VE%208.1%20vGPU%20%E9%85%8D%E7%BD%AE%20%EF%BC%88A6000%EF%BC%89/Untitled%202.png?x-oss-process=style/ImageCompress) + +5. 重启服务器,重启后应该显示为 3D Controller + + ```bash + 17:00.0 3D controller: NVIDIA Corporation GA102GL [RTX A6000] (rev a1) + 65:00.0 3D controller: NVIDIA Corporation GA102GL [RTX A6000] (rev a1) + ca:00.0 3D controller: NVIDIA Corporation GA102GL [RTX A6000] (rev a1) + e3:00.0 3D controller: NVIDIA Corporation GA102GL [RTX A6000] (rev a1) + ``` + +### 安装驱动 + +1. 安装 NVIDIA Driver 安装时需要的依赖 + + ```bash + apt update + apt install build-essential dkms mdevctl pve-headers-$(uname -r) + ``` + +2. 安装驱动,下载的驱动包有好几个驱动,安装 host 驱动。驱动可以从[这里](https://index.mitsea.com/%E8%BD%AF%E4%BB%B6/%E9%A9%B1%E5%8A%A8%E5%92%8C%E5%85%B6%E4%BB%96%E9%95%9C%E5%83%8F/NVIDIA-GRID-Linux-KVM-535.104.06-535.104.05-537.13.zip)下,但是不保证链接有效性。把驱动传到服务器上后,设置执行权限后运行。 + + ```bash + chmod +x NVIDIA-Linux-x86_64-535.104.06-vgpu-kvm.run + ./NVIDIA-Linux-x86_64-535.104.06-vgpu-kvm.run --dkms + ``` + +3. 执行 `nvidia-smi` 后无误即可 + + ![](https://img.mitsea.com/blog/posts/2023/12/Proxmox%20VE%208.1%20vGPU%20%E9%85%8D%E7%BD%AE%20%EF%BC%88A6000%EF%BC%89/Untitled%203.png?x-oss-process=style/ImageCompress) + +## 搭建 vGPU 授权服务器 + +[Oscar Krause / FastAPI-DLS · GitLab](https://git.collinwebdesigns.de/oscar.krause/fastapi-dls) + +按照仓库 Readme 搭建就行了,主要就是强制 https,本地的话需要生成一个自签名证书。法外狂徒挂公网可以无视,nginx 证书配好就行。对于挂在公网上有几个注意点: + +1. docker 命令中的 `DLS_URL=`hostname -i`` 填你反代时要使用的域名例如`DLS_URL=`xxx.xxx.com`` +2. `DLS_PORT=443` 不要动,只改 port 映射出去的端口,比如改成 `-p 4433:443` 这样反代那边就反代容器 IP:4433 + +## 虚拟机添加设备 + +开机后需要启用 SR-IOV 设备,每次开机都要执行,可以写成一个服务开机自动执行一次 + +```jsx +/usr/lib/nvidia/sriov-manage -e ALL +``` + +Raw Device 选择一个不是 .0 的设备后,MDev Type 就可以选 vGPU Profile 了。如果想要用整张显卡,也不要通 .0 的设备,据说会容易导致 pve 爆炸失联,建议还是选择一个用完所有显存的 Profile。 + +![](https://img.mitsea.com/blog/posts/2023/12/Proxmox%20VE%208.1%20vGPU%20%E9%85%8D%E7%BD%AE%20%EF%BC%88A6000%EF%BC%89/Untitled%204.png?x-oss-process=style/ImageCompress) + +![](https://img.mitsea.com/blog/posts/2023/12/Proxmox%20VE%208.1%20vGPU%20%E9%85%8D%E7%BD%AE%20%EF%BC%88A6000%EF%BC%89/Untitled%205.png?x-oss-process=style/ImageCompress) + +## 激活 vGPU 授权 + +参考激活服务器 Readme 中 Setup Client 一节 + +[Oscar Krause / FastAPI-DLS · GitLab](https://git.collinwebdesigns.de/oscar.krause/fastapi-dls#setup-client) + +### Windows + +1. 进入 Windows 后先安装之前那个驱动包里的 host 驱动 +2. 从 https://<你的dls服务器>/-/client-token 上下载配置文件,然后放到 C:\Program Files\NVIDIA Corporation\vGPU Licensing\ClientConfigToken 下 +3. 重启电脑,然后就能看到正在获取许可证并激活成功 + + ![CleanShot 2023-12-13 at 22.17.13@2x.png](Proxmox%20VE%208%201%20vGPU%20%E9%85%8D%E7%BD%AE%20%EF%BC%88A6000%EF%BC%89%20d29bb28b14984443b232263348b946ba/CleanShot_2023-12-13_at_22.17.132x.png?x-oss-process=style/ImageCompress) + +### Linux + +执行下面的命令 + +```bash +curl --insecure -L -X GET https:///-/client-token -o /etc/nvidia/ClientConfigToken/client_configuration_token_$(date '+%d-%m-%Y-%H-%M-%S').tok +service nvidia-gridd restart +``` + +> Photo by [Jigar Panchal](https://unsplash.com/@brave4_heart?utm_content=creditCopyText&utm_medium=referral&utm_source=unsplash) on [Unsplash](https://unsplash.com/photos/a-very-colorful-abstract-background-with-a-lot-of-blocks-TVyPnkS5k5w?utm_content=creditCopyText&utm_medium=referral&utm_source=unsplash) + \ No newline at end of file