Demystifying Performance: de març 2017

dilluns, 27 de març del 2017

The upgrade sizing pitfall

The art of sizing is not exempt of pitfalls, and you must be aware of them if you want your sizing to be accurate and adequate. Let us talk about a typical scenario: a server upgrade.

All the metrics in sizing are measures of throughput, and this has an implication you must take into account: not always the service center (the server, here) with higher throughput capacity is better. What are you saying man?

Let's consider two servers, the base one and its intended upgrade:

Base: single core server with a capacity (maximum throughput = bandwidth) of 1 tps (transaction per second). Therefore the transaction service time is 1 second.
Upgrade: four core server with a capacity of 2 tps. Therefore the transaction service time is 2 seconds.

If you exclusively look at the throughput, B (2 tps) is better than A (1 tps). Period.

But from the response time perspective such a superiority must be revised. Let us graph the response time versus the number of users:

Figure: Best response time (average) versus the number of users, with a transaction think time of 30 seconds. Server A (base) in blue. Server B (upgrade) in red.

In the light load zone, that is, when there are no or few queued transactions, A is better than B. This is consequence of a better (lower) service time for server A. In the high load zone B is better than A. consequence of a better (higher) capacity (throughput). If the workload wanders in the light zone such an upgrade would be a bad idea.

So when you perform a sizing you must know wich point of view is relevant to your sizing exercise: the throughput or the response time. Don't fall in the trap. A higher capacity (throughput) server is not unconditionally better. For an upgrade server to be unconditionally better its capacity (throughput) must be higher and its service time lower.

dilluns, 20 de març del 2017

What does Usage mean?

My objective today is to clarify the meaning of one of the most used metric in sizing and in performance analysis: the usage.

First things first

The Usage exposes the state of the service center. It is a binary quantity:

usage=0 when idle (not working)
usage=1 when busy (working)

Any service center at any point in time is idle (usage=0) or busy (usage=1).

Usage Percentage (Usage%)

This is the "usage" metric we are familiarized with. It is the average of the usage over a certain time interval, called the averaging time. Typically is expressed as percentage.

In example usage%=20% (for a certain time interval) means that for that time interval the service center has been:

20% of the time busy
80% of the time idle

Averaging Time

The averaging time for calculating the usage% is of capital importance. Saying 20% is not enough, saying 20% over an hour is right.

Theory is very simple, but in practice the averaging time is frequently dropped. Always ask for it (but don't expect to have a crystal clear response).

To stress its importance look at these graphs representing the time evolution of the usage% metric for the same workload and different averaging time.

Averaging Time = 10 s

Averaging Time = 10 min

Averaging Time = 500 min

I insist: in the three graphs the workload is the same (and the total amount of work done -the area under the usage% curve- remains the same). The short averaging time is best suited for performance analysis, the medium is for sizing, and the long is for trend analysis.

dilluns, 13 de març del 2017

Phases of the response time with variability

Let's go to the response time versus the number of users signature (graph below) in the simple system described in "Phases of the Response Time". We made two simplifying assumptions: the service time is constant, and the interarrival time is constant (the the think time is constant). These assumptions allowed us to distill the essence of the response time dependencies, providing us with very useful insights regarding the response time behaviour, the primary parameters it depends on, how it depends on them, and what are its trends and its limits.

Figure 1: The response time versus the number of users for the simple model.

In the real world... variability!

Now let's go a step further introducing the variability. Customers seldom arrive to a service center at uniform intervals (arrivals side variability). Customers seldom demand the same service time (service side variability). Both magnitudes are essentially variable.

The analysis of the response time with variability is usually done using a powerful mathematical tool called probability analysis. Every time you hit into a queueing theory textbook or article, you'll find probabilities. Our constant values are transformed in something like this: "it's twice more probable to have a service time of 2 s than 1 s", "there's a probability of 80% that service time lies in the interval (1 s, 2 s)", “the average response time is 2 s”, and so on. We enter the probabilistic world, where the raw material is random variables, on which we can only state probabilistic facts.

But, for now, I'm interested in highlighting the main consequences of the variability, not in performing an analytical in-depth analysis.

Variability Effects

Look at the figures 2 and 3.

Figure 2: No variability case. Uniform arrivals + Uniform service --> No waits --> Best and uniform response time.

This "no variability" figure corresponds to our simple all-constant case (Figure 1). Uniform arrivals and uniform service time result in no waits and the best and uniform response time.

Figure 3: With variability case (arrivals side). Non-uniform arrivals + Uniform service --> Waits --> Worse and variable response time

The "with variability" diagram illustrates the arrivals side variability, in particular batch arrival of users. Non-uniform arrivals result in waits, and the response time seen by users is worse and variable (or volatile).

Ideas to take --> Variability effects are:

The response time is variable: the response time varies from user to user and from successive visits from the same customer.
The average response time is worse: waits show up due to the lack of uniformity. increasing the response time.

An analysis of the same simple model but allowing random -exponentially distributed- variation of the service time and the think time results in the graph in figure 4.

Figure 4: Average response time for the all-constant (blue) and the random (red) cases..

We can see, for example, that when the user population reaches 80% of the saturation value the average response time for the random model is 4 times the one for the all-constant model, and for 100% of the saturation population the ratio increases to 8 times!

By the way. don’t underestimate the effects of the variability, as it is one of the causes of unacceptable performance from the customer point of view: in general a customer is willing to accept higher but uniform response times than lower but highly variable ones.