I have been tuning my Home Lab network during my vacation. Private AI/ML, vGPU and VDI requires a fast Home Lab setup. I tweaked my Switch (Dell EMC S4112T-ON), UDM Pro, ESXi hosts, and also revisited the subnetting logic. I went over my servers’ BIOS CPU and RAM features and found out that default settings on my servers had many newer generation features disabled. This gave a significant performance boost and also reduced electricity consumption for the servers.
You might think how this affects or is related to network speed, but everything is actually interconnected. The architecture of the motherboard and CPU, the number of CPU cores, CPU frequency, RAM size, speed, motherboard PCIe speed, and of course, NIC features and offloading capabilities, as well as the protocols and applications that are used. And because of this, reducing the chain and taking the load off the CPU, the popularity of DPUs is growing. For example, Nvidia Bluefield, AMD Pensando. Google, for instance, used DPUs to train its AI language model, and a key word for AWS/Azure is the implementation of DPUs in their data centers. But I will talk about my DPU experimentation later.
You might think that simply buying a 100 GbE switch and a 100 GbE NIC will immediately give you that speed. However, without tuning and analyzing every component individually, it’s not possible to achieve this. Details matter.
Many thanks to the Daniel Krieger and Nicholas Schmidt, whose significant advice and deep knowledge have been crucial in tuning my home lab network. Without your advice, I have no idea how long it would have taken to achieve the same results.
Parameters and Their Meanings:
–c 10.1.0.21: Specifies client mode and sets the server address to 10.1.0.21. The client connects to this IP address to perform the test.
–P 9: Sets the number of parallel streams to 9. This configuration allows the test to run with nine simultaneous TCP streams, which is useful for testing aggregate bandwidth or handling multiple connections concurrently.
–w 12M: Configures the TCP window size to 12 megabytes. This setting adjusts the size of the buffer used for TCP communication, which can potentially enhance throughput in networks with high bandwidth and high latency.
–l 1M: Sets the buffer length to 1 megabyte. This parameter controls the size of data chunks sent in each transmission, where larger buffer sizes can improve efficiency in high-bandwidth scenarios.
Here are then my results:
Between ESXi hosts 110 GBytes 94 Gbits/sec (PCIe 3.0 x16 slot is around 126 Gbps)
./iperf3 -c 10.1.0.21 -P 9 -w 12M -l 1M
Between VMs on separate ESXi hosts 44.3 GBytes 38 Gbits/sec (different hosts) (vmxnet3 Ethernet Adapter)
iperf3 -c 10.6.0.5 -P 1 -w 10M -l 1M
Between VMs on same ESXi hosts 45.8 GBytes 39.3 Gbits/sec (same hosts) (vmxnet3 Ethernet Adapter)
iperf3 -c 10.2.1.4 -P 3 -w 10M -l 1M
Baremetal PC to ESXi Hosts 115 GBytes 98.9 Gbits/sec
iperf3 -c 10.1.0.21 -P 4 -w 4M -l 1m
Baremetal PC to VM on ESXi Hosts 43.6 GBytes 37.4 Gbits/sec (vmxnet3 Ethernet Adapter)
iperf3 -c 10.1.0.21 -P 4 -w 4M -l 1m