I bought an Nvidia Tesla P4. It was an unused GPU and came with a 3D-printed cooler and fan. I played around with this GPU on my AI/ML server, and it worked fine. Then I decided to move it to my other server, which runs 24/7.
The reason is simple: I have jump hosts and VMs on that server where I record my home lab activities using OBS. I wanted to use a hardware video encoder for recording with OBS. Software encoders that use the CPU work, but during long recordings, the VM sometimes slows down. It would be better to use a hardware-based encoder. Additionally, having the GPU attached to the VM makes it run much faster.
My motherboard has the following PCIe gen3 slots:
Slot 6: 1 PCIe 3.0 x16
Slot 4: 1 PCIe 3.0 x16 (x16 || x8)
Slot 3: 1 PCIe 3.0 x8 (x0 || x8)
Slot 2: 1 PCIe 3.0 x8
Slot 1: 1 PCIe 3.0 x4 (in x8)
Slot 6 has a HYPER M.2 X16 CARD V2 (4x M.2 NVMe), Slot 4 has a 100-gig NIC, Slot 2 has a KIOXIA CD6 SSD U.3 NVMe, Slot 1 has an E1.s SSD Ruler D5-p4326 EDSFF PCIe, and Slot 3 was free. The plan was to use it for the GPU, so I ordered a PCIe x8 to PCIe x16 adapter.
I planned to use Slot 3 and installed the GPU there, but it wasn’t detected. I tried different BIOS settings, but nothing worked. I also tried another device in that slot and even removed the NIC from Slot 4, but still nothing. It turned out that Slot 3 couldn’t be used for anything. I later checked my other two servers, which have similar but slightly newer motherboards, and found the same issue—Slot 3 was unusable despite having available PCIe lanes.
Since Slot 3 wasn’t an option, I decided to use Slot 2. This required removing the KIOXIA CD6 SSD U.3 2.5″ NVMe (KCD61LUL7T68) 7.68TB. Everything seemed to be working; the server recognized the GPU, and everything functioned well. But when I started installing the GPU drivers for ESXi, I kept getting errors. I thought it might be due to using a new version of drivers, but the same problem occurred with an older version.
Finally, I decided to swap things around. I put the GPU in Slot 4, where the NIC was, and moved the NIC to Slot 2 using an adapter. The NIC worked fine, and I didn’t encounter any issues. I then installed the GPU drivers on ESXi, and it worked immediately. After a restart, everything functioned correctly.
What seemed like a simple and quick task turned into hours of troubleshooting. With my motherboard, adapter, and GPU, it wasn’t possible to use everything together. However, everything worked without any issues on my AI/ML server, which has a newer motherboard with gen 5 PCIe.