Our application is slow, let's throw more CPU's at it!

If you work in IT, either as a developer or a sysadmin, you might’ve heard that phrase before. Something is slow, you throw hardware at it. Well, in a virtual environment (and what isn’t a VM nowadays?), that isn’t always the good fix.

It’s jut not just overcommitting

The obvious reason is that those CPU’s need to come from somewhere and that hardware might be overloaded. If you only have 16 cores in your physical machine, it doesn’t make much sense to have 4x VMs running with 8 cores each consuming 100% CPU.

Simple math tells you that 4x VMs x 8 CPUs = a demand for 32 CPUs, where your hardware can only supply 16 CPUs ⁽¹⁾.

On the other hand, if each of those VMs would only consume 50% of their assigned CPUs, everything’s fine – right?

Well, not so much.

^{(1) Yes, hyperthreading, I know.}

Synchronising CPU clock cycles

In the VMware world there’s fancy term called %CSTP or Co-Stop. The same logic applies to Xen, KVM and any other virtualisation technology.

It boils down to this.

The %CSTP value represents the amount of time a virtual machine with multiple virtual CPUs is waiting to be scheduled on multiple cores on the physical host. The higher the value, the longer it waits and the worse its performance.

Determining if multiple virtual CPUs are causing performance issues

Or phrased differently: the Co-Stop percentage is the time a Virtual Machine is ready to run, but had to wait for the underlying hardware to schedule the CPUs it demanded on its physical hardware.

It’s a CPU scheduling issue.

Imagine you have a VM with 8 CPU’s assigned to it. If a new process wants to run in that VM, it requests CPU time from the kernel. That is then interpreted by the hypervisor below, translating those Virtual CPUs to physical sockets & cores.

But it has to keep the other VMs in mind too, since those might also be requesting CPU time.

And here’s the catch: when a VM is ready to be scheduled, the hypervisor needs to schedule all of its allocated CPUs at the (near) same time. When your VM has 8 CPUs but only really needs 2, your hypervisor will try to schedule all 8 CPUs at the same time. That might take longer due to congestion on the hypervisor since other VMs can be asking time from those very same CPUs.

But why wait for 8 CPUs to be scheduled when in reality your VM only needed 2? That time spent waiting for the hypervisor to have 8 CPUs free to be assigned is time waisted.

This is especially apparent in cloud environments where its easy to either over-commit (as the hoster) or over-reserve (as the cloud tenant).

In general, it’s a very good rule of thumb to only assign the resources you need in a virtual environment, not more, not less. Assigning more might hurt performance due to CPU scheduling congestion, assigning too little resources will cause your VM to run at 100% capacity all the time.

Don’t over commit. Don’t under provision. Assess your application and provision the right resources.

Want to subscribe to the cron.weekly newsletter?

I write a weekly-ish newsletter on Linux, open source & webdevelopment called cron.weekly.

It features the latest news, guides & tutorials and new open source projects. You can sign up via email below.

No spam. Just some good, practical Linux & open source content.

In virtual environments, less (CPU’s) is more

It’s jut not just overcommitting

Synchronising CPU clock cycles

Want to subscribe to the cron.weekly newsletter?