In virtual environments, less (CPU’s) is more

Mattias Geniar, Monday, December 19, 2016

Our application is slow, let's throw more CPU's at it!

If you work in IT, either as a developer or a sysadmin, you might've heard that phrase before. Something is slow, you throw hardware at it. Well, in a virtual environment (and what isn't a VM nowadays?), that isn't always the good fix.

It's jut not just overcommitting

The obvious reason is that those CPU's need to come from somewhere and that hardware might be overloaded. If you only have 16 cores in your physical machine, it doesn't make much sense to have 4x VMs running with 8 cores each consuming 100% CPU.

Simple math tells you that 4x VMs x 8 CPUs = a demand for 32 CPUs, where your hardware can only supply 16 CPUs (1).

On the other hand, if each of those VMs would only consume 50% of their assigned CPUs, everything's fine -- right?

Well, not so much.

(1) Yes, hyperthreading, I know.

Synchronising CPU clock cycles

In the VMware world there's fancy term called %CSTP or Co-Stop. The same logic applies to Xen, KVM and any other virtualisation technology.

It boils down to this.

The %CSTP value represents the amount of time a virtual machine with multiple virtual CPUs is waiting to be scheduled on multiple cores on the physical host. The higher the value, the longer it waits and the worse its performance.
Determining if multiple virtual CPUs are causing performance issues

Or phrased differently: the Co-Stop percentage is the time a Virtual Machine is ready to run, but had to wait for the underlying hardware to schedule the CPUs it demanded on its physical hardware.

It's a CPU scheduling issue.

Imagine you have a VM with 8 CPU's assigned to it. If a new process wants to run in that VM, it requests CPU time from the kernel. That is then interpreted by the hypervisor below, translating those Virtual CPUs to physical sockets & cores.

But it has to keep the other VMs in mind too, since those might also be requesting CPU time.

And here's the catch: when a VM is ready to be scheduled, the hypervisor needs to schedule all of its allocated CPUs at the (near) same time. When your VM has 8 CPUs but only really needs 2, your hypervisor will try to schedule all 8 CPUs at the same time. That might take longer due to congestion on the hypervisor since other VMs can be asking time from those very same CPUs.

But why wait for 8 CPUs to be scheduled when in reality your VM only needed 2? That time spent waiting for the hypervisor to have 8 CPUs free to be assigned is time waisted.

This is especially apparent in cloud environments where its easy to either over-commit (as the hoster) or over-reserve (as the cloud tenant).

In general, it's a very good rule of thumb to only assign the resources you need in a virtual environment, not more, not less. Assigning more might hurt performance due to CPU scheduling congestion, assigning too little resources will cause your VM to run at 100% capacity all the time.

Don't over commit. Don't under provision. Assess your application and provision the right resources.



Hi! My name is Mattias Geniar. I'm a Support Manager at Nucleus Hosting in Belgium, a general web geek, public speaker and podcaster. Currently working on DNS Spy. Follow me on Twitter as @mattiasgeniar.

I respect your privacy and you won't get spam. Ever.
Just a weekly newsletter about Linux and open source.

SysCast podcast

In the SysCast podcast I talk about Linux & open source projects, interview sysadmins or developers and discuss web-related technologies. A show by and for geeks!

cron.weekly newsletter

A weekly newsletter - delivered every Sunday - for Linux sysadmins and open source users. It helps keeps you informed about open source projects, Linux guides & tutorials and the latest news.

Share this post

Did you like this post? Will you help me share it on social media? Thanks!

Comments

daboule Tuesday, December 27, 2016 at 23:06

I’m not agree with you about the sentence :

In the VMware world there’s fancy term called %CSTP or Co-Stop. The same logic applies to Xen, KVM and any other virtualisation technology.

The technology PowerVM (IBM) + Unix AIX permits you to over allocate CPU ressources to LPAR / VM without problems you described. This great feature is called : CPU Folding

AIX performs hypervisor calls when it needs more CPU or less CPU. It is capable of performing CPU cedes cycles.

https://www.ibm.com/developerworks/

FYI : Another major aspect of PowerVM technology is that the POWER processor works with three ring level : Hypervisor / Supervisor / User

And AIX is paravirtualized.

That was just an friendly comment in order to remind your readers that x86/vmware/etc.. is not the only technology that rules the world :-)

Reply


Leave a Reply

Your email address will not be published. Required fields are marked *

Inbound links