Bad Performance for VM on AMD Gen2 EPYC
As we recently deployed some new second Gen AMD EPYC Servers ProLiant DL325Gen10 Plus using AMD EPYC 7542 we saw some very strange CPU performance problems. Meaning a 1:1 mapping from physical to virtual CPUs was showing very high ready times up to 30%.
There were some fixes in the scheduler in 7.0.2 but still the problem persisted.
There is a very good tuning guide https://www.vmware.com/content/dam/digitalmarketing/vmware/en/pdf/techpaper/performance/vsphere70u2-cpu-sched-amd-epyc.pdf that helped me to find the issue.
Problem seems to be that the default setting in the HPE Bios for Numa memory domains per socket (NPS) is set to Auto that doesn't seem to work well with ESXi. When setting this to NPS-1 performance is as expected and no ready times seen. Hope HPE will explain more on the "Auto" setting.