Demystifying Performance

I've always felt attracted to computer performance subjects: throughput, response time, sizing, bottlenecks, usage, capacity. I've not always seen clear, understandable and knowledgeable explanations on these subjects. With the arrival of the virtualization even more complexity is introduced: terms that represented constant values, like the machine capacity, gets blurred. What can be said about a virtual machine capacity if it is no longer a fixed value and may change from this hour to the next?

Es mostren els missatges amb l'etiqueta de comentaris benchmark. Mostrar tots els missatges

dijous, 1 de setembre del 2016

Phases of the SAPS Benchmark

The SAPS benchmark fits very closely in the model analyzed in the post "Phases of the Response Time". In essence the benchmark is performed by progressively increasing the customer population, and monitoring the response time. When response time reaches 1 second, the measured throughput, as expressed in dialog steps per minute, is the SAPS value. The think time remains constant at 10 s.

SAPS is a performance metric measuring throughput: 1 SAPS is 1 dialog step (a unit of work defined by SAP) per minute. This basically means 1 customer service per minute

Model Calibration

Let us consider, without loss of generality, the benchmark number 2014034, corresponding to an IBM POWER E870.

According to the benchmark certificate these are the measured values:

m = 640, the number of concurrent HW threads,
X = 26166000 ds/h = 436100 ds/min (SAPS).
N = 79750, the number of users.
R = 0.97 s, the average response time.

The service time is derived from those values:

S = m/B = 0.088 s/ds.

As R>>S the system must be well above the saturation point. The saturation point N* is N*=m(1+Z/S)= 73323, and N, as stated in the benchmark certification, is 79750. Clearly N>N*, confirming the hypothesis that the system is operating well over the saturation point.

At this point we have our simple model well calibrated. To double check it, we can see that the response time predicted by our simple model would for a population of N=79750, the benchmark population, is 0.969s, while in the certification the response time in 0.97 s, a very close agreement!

Prognosis (Prediction)

The key benefit of having a simple model for studying the performance behavior of more complex systems is the prediction ability it gives to us. We ask and the model responds.

Question .- If the response time limit in the SAP benchmark were 2 s instead of 1 s, what would the system throughput (SAPS) be?

Answer.- Contrary to what some may think... it will remain almost the same! Being above the saturation point the model predicts the same throughput, equal to the maximum throughput (system bandwidth). The delta SAPS for this change will be very slight and not significant.

Question.- If response time limit in the SAP benchmark were 0.5 s instead of 1 s, what would the system throughput (SAPS) be?

Answer.- The response is the same as before: almost the same. A response time of 0.5 is also above the saturation point and this implies that the system throughput will be the same. In the real world there would be a non significative (negative) delta.

We've varied the response time limit by a wide margin, from -50% (here) to +100% (previous case) and the SAPS remain almost the same... I bet you wouldn't have said so before reading this article.

Question.- But...what changes then between current 1 s and new 0.5 s?

Answer.- The number of benchmark users needed to get to the response time limit, less in the 0.5s than in the 1s case.

Question.- With current benchmark definition and 90000 users, what will the expected response time?

Answer.- Looking at the (number of users, response time) graph above, you can conclude the response time will be around 2.4 s.

divendres, 29 d’abril del 2016

Introducing the Response Time

The response time is the loser in the set of all performance metrics. It's systematically ignored in almost every sizing, it’s neither measured nor accounted in most cases, and it's only marginally referenced in certain benchmark definitions. Clearly, the throughput is the winner: SAPS, IOPS, transactions per second are throughput metrics, not response time ones.

But from the end user point of view the response time is the king. Any service center performs very well or very poorly according to the response time perceived by its users. When the response time is bad It doesn't matter to the customers if the system administrator proclaims they are working with a many-oooops-per-second system. Just the system is slow and, consequently, bad. Period.

The Service Center

Look at the following picture, an idealization of any service center. A service center is a place, logical or physical, where customers or users enter, are given a certain service by a worker (or a server), and exit. There's a line (queue) where arriving customers can queue and wait if all the workers are busy.

Figure: Service center with server(s), queue, and arriving and departing customers.

You interact with service centers everywhere, as it is very general concept. For example:

Service Center	Service Given	Customer	Worker
Hairdresser’s	A haircut	People needing a haircut	Hairdresser
Our Company department	A step in an opportunity forward path	Company customers	Yourself
OLTP IT Server	CPU time	Users entering transactions at their terminals	CPU cores
Supermarket checkout	The checkout payment	Supermarket customers	Cashier
Bus Transport	A transport from place A to place B	Passengers	The driver and the bus
Public office desk	A request to the public administration.	Citizens	Public servant
Machine Repair Center	Fixing the malfunctioning machine	Machines	Repairman

A service center is fully characterized by the following magnitudes:

The service time (S): the elapsed time a service takes,
The number of servers (m): the number of customers that can receive service simultaneously.

The Response Time

The Response Time (R) is the time elapsed from the arrival of a customer to the service center to its departure after receiving service.

The response time has two contributions: the service time (S) and the wait time (W), resulting in this fundamental relationship:

R = W + S

To illustrate this look at the following scenario where we have depicted the flow of several customers through a very simple service center (m=1, S=cte)

Customer #1 arrives a t=0, finds the worker/server idle, spends no time at all at waiting, is given a certain service taking S units of time, and exits the center of service at t=S. The wait time is 0. The response time is S.
Customer #2 arrives a t=S, spends no time at all waiting, is given a certain service taking S units of time, and exits the center of service at t=2S. The wait time is 0. The response time is S.
Customer #3 arrives a t=2S, spends no time at all waiting, is given a certain service taking S units of time, and exits the center of service at t=3S. The wait time is 0. The response time is S.

In this scenario none of its customers must wait for service, and so the response time is equal to the service time. The wait time contribution is zero.

Now look at the following scenario differing from the previous one in the way customers arrive.

Customer #1 arrives a t=0, spends no time at all at the queue, is given a certain service taking S units of time, and exits the center of service at t=S. The wait time is 0. The response time is S.
Customer #2 arrives a t=0, has to wait in queue S units of time, is given a service taking S units of time, and exits at t=2S. The wait time is S. The response time is 2S.
Customer #3 arrives a t=0, has to wait in queue 2S units of time, is given a service taking S units of time, and exits at t=3S. The wait time is 2S. The response time is 3S.

Some of the customers must wait for service and, consequently the response time has the two contributions: the service time (grey) and the wait time (blue),

Waiting for the SAPS

Looking at a SAPS benchmark report, may you tell which contribution to the response time is bigger: the wait or the service time?

Remember this first: SAPS is a performance metric measuring throughput: 1 SAPS is 1 customer service per minute, or 1 dialog step per minute (dialog step is an SAP term). The benchmark is essentially performed by progressively increasing the customer population, and monitoring the response time. When the response time reaches 1 second (R=1 s), the measured throughput, as expressed in dialog steps per minute, is the SAPS value.

A SAP server in essence is a service center for SAP dialog steps. And according to the certification report these are its measured numbers:

There are 128 workers, corresponding to 128 HW threads executing dialog steps, so m=128.
The average response time is measured to be 0.98 s. R=0.98 s.
5113000 dialog steps are executed (serviced) in an hour.

And from the above the following values can be easily derived:

The service time is S=0.090 s , as every worker processes (5113000/128) ds per hour, and then ( 1 h / (5113000/128) ds ) · ( 3600 s/h) = 0.0901 s.
The wait time is W= 0.89 s (W = R - S).

The dialog step response time is decomposed into 0.89 s of wait time and 0.09 s of service time:

Wow! W >> S, in fact W ~ 10·S. So it’s sure there is a queue of dialog steps waiting to be processed. If you scale this to, for example, a supermarket checkout counter that spends 5 minutes per customer, W=10·S means a 50 minutes waiting in the line!

Coming Soon

This is the first blog entry of the Demystifying the Response Time series, an exploration into the response time jungle. There are very interesting facts and analysis ahead. Stay tuned.

New blog URL: http://www.ibm.com/blogs/performance

Mirror blog: http://demystperf.blogspot.com

dilluns, 11 de gener del 2016

Don’t put in the same bag Xeon and POWER virtual CPUs

Just a reminder: this blog is a mirror of the "main" site https://www-304.ibm.com/connections/blogs/performance (http://ibm.biz/demystperf)

In the pervasive virtual world the standard unit of performance capacity happens to be the virtual CPU (vCPU). “This virtual machine (VM) has 6 vCPUs”, “you will have to provide 8 vCPUs for that VM”, and the like are common sentences. It could be a reasonable metric of performance if the underlying physical CPUs and the hypervisor layer were all the same. But this is seldom the case: Intel Xeon processors combined with VMware ESX virtualization, and IBM POWER processors combined with POWERVM virtualization are very different beasts.

If you are in need to size or convert capacity between these dissimilar systems, or in need to have a solid comparison base, or would like to unmask tricks and pitfalls that plague the virtual world sizing, continue reading.

IBM POWERVM

The POWERVM term for vCPU is Virtual Processor (VP). The VM has, or sees, VPs. And these VPs are scheduled, in a time-shared manner, on POWER cores. Yes, read it again: one VP is scheduled on one core. I remark this because in the ESX / Intel world this is different, as you will see later.

Given this VP-to-core mapping, the VP capacity ranges between two values:

In the best case the capacity of one VP is the capacity that one core can deliver, that is 1 VP is 1 core
In the worst case 1 VP is (1/10)th of core.

The actual VP capacity depends on the following factors (revisit “Why is the Virtual Capacity so Important?” and “The Playground of the Virtual Capacity” in this blog for a detailed explanation):

configuration parameters of the VM the VP belongs to (entitlement, capped/uncapped attribute, uncapped weight)
configuration parameters of all the other VMs sharing the same physical machine (PM)
actual usage of capacity from all the other VMs sharing the same physical machine

Given this highly variable value, contrary to what one unit of measure must be, how have VPs been promoted to be a “standard” measure of capacity? Amazing, don’t you think so?

VMWARE ESX

The VM has, or sees, vCPUs. And those vCPUs are scheduled on Intel processor threads, in a time shared manner. The mapping is vCPU-to-thread, and is different than the POWER / POWERVM case (VP-to-core).

Given this vCPU-to-thread mapping, the vCPU capacity ranges between two values:

In the best case the capacity of one vCPU is the capacity that one thread can deliver, that is 1 vCPU is 1 thread
In the worst case one vCPU is very small (I’m not aware of a low limit)

The actual vCPU capacity depends on the same factors described in the POWERVM case,

configuration parameters of the VM the VP belongs to
configuration parameters of all the other VMs sharing the same PM
actual usage of capacity from all the other VMs sharing the same PM.

Benchmarking vCPUs

The reputation of the vCPU as a stable capacity unit of measure has been destroyed. A vCPU capacity can range from a full core (or to a full thread) to a small fraction, and even depends on alien factors (from other VMs)!

Is there a way to put some sense in this nihilism?

Yes, it is. By taking a practical approach: use the best case values. You know that the actual performance will always be equal or worse than that, but we have to live with this.

To evaluate the best case let’s consider the two systems we analyzed in SAPS Olympics: single thread performance post in this blog.

Dell PowerEdge R730 2s/36c/72t Intel Xeon E5-2699 v3 @2.30 GHz	Physical System	IBM POWER S824 2s/24c/192t POWER8 @3.52GHz
36	Cores	24
72	Threads	192

The best performance VM setup running on these systems is a single VM will all the processors assigned, that is:

IBM POWER S824 2s/24c/192t POWER8 @3.52GHz with 24 VPs (= 1 VP/core x 24 cores).
Dell PowerEdge R730 2s/36c/72t Intel Xeon E5-2699 v3 @2.30 GHz with 72 vCPUs (= 1 vCPU/thread x 2 thread/core x 36 cores).

And the final results would be:, without taking into the reduction of capacity due to virtualization:

Dell PowerEdge R730 2s/36c/72t Intel Xeon E5-2699 v3 @2.30 GHz	Physical System	IBM POWER S824 2s/24c/192t POWER8 @3.52GHz
36	Cores	24
72	Threads	192
1	VM	1
72	vCPU / VP	24
90120	SAPS	115870
1250	SAPS/vCPU	4828

The dramatic difference, 4282 SAPS/VP vs 1250 SAPS/vCPU, would be even greater taking into account virtualization effects, as is widely known that POWERVM is more efficient than ESX. We may consider a 3-5% reduction for POWERVM and a 10-15% for ESX.

If the Intel Xeon Hyperthreading is switched off (HT=0ff), and this seldom happens in a virtualized environment, the capacity numbers would improve for Intel vCPUs to 2019 SAPS per vCPU ( = 2019 SAPS/core x 1 core/thread x 1 thread/vCPU ), again without taking into account virtualization overheads.

Summarizing

Which capacity should be assigned to vCPUs? I would take the above estimated values, representing best cases in benchmark conditions, reduced between 3-5% for POWERVM and 10-15% for ESX. This results in this approximate and simple relationship:

1 POWER8 VP ≈ 4 Xeon Haswell-EP vCPU (HT=On)