Demystifying Performance: de juliol 2014

dilluns, 28 de juliol del 2014

Reporting the Peak

Objective

Let us practice a little bit with confidence intervals (CI) and uncertainty estimates.

We are going to deal with the system usage case, a critical concept for sizers. For example, it is the main magnitude used to evaluate the base point and propose the right upgrade to the current server. There are many ways to define and measure the system usage, but for the purpose of this blog entry it doesn’t matter the particular choice. I’ll call the chosen magnitude generically as the peak usage (PU).

ABC of the Confidence Interval (CI)

You should review the “The importance of what it is not said” in this blog for detailed explanations, but here are the basic facts:

The CI express the uncertainty or probable error you commit when estimate the value or certain magnitude (PU in our case). It is reported in the following format

CI Center ± CI Length

where

CI Center = AVERAGE(data), the arithmetic average of the data (the measured values)
CI Length = 2* T.INV.2T(1-conf, size-1)*STDEV.S(data)/SQRT(size)

and

data is the sample, or set of measured data points,
size is the sample size, the number of data points,
conf is the desired confidence level (CL),
T.INV.2T(), STDEV.S() and SQRT() are worksheet (Excel) functions (¹).

The CI length -the urcentainty / imprecision- depends on, and varies with, the following factors:

Decreases with the sample size,
Increases with the sample variability/dispersion,
Increases with the confidence level.

Analysis of Samples

Scenario: a typical OLTP service, in which the peak hour is always from 11 to 12 a.m. Due to the inherent randomness of the system –in customers arrivals, in service times- the height of the peak, the PU, its maximum value, varies from one day to another. One day the maximum is 68%, other day 72%, other 62%, and so on. You have compiled a series of 20 days of daily PU, as showed in figure 1. Of course, you have conveniently revised the points to be sure, beyond any reasonable doubt, of their homogeneity and representativeness (i.e. they are values for normal workdays, and holidays have been removed).

Fig. 1: Daily peak usage for the low variability (LoVar) case. Sample size=20, sample average=70.5%, sample standard deviation=3.7%.

The CI interval for the average peak usage is 70.5 ±1.5% for a confidence level of 90%. This essentially means that there is a scarce 5% probability that the all-the-days peak usage average (²) lies above 72%, the CI upper limit (UL).

Of course we are speaking statistically: it is possible that values in figure 1 happen to be abnormally low, as there is a 5% probability for this to happen. If you want to minimize this probability, increase the confidence level, and the CI length increases as well. With 98% CL the probability goes down to 1% (an average of one of every 100 20-day samples!).

More often than not we don’t have the luxury of large size samples. For example, you’ve been asked to size some system and the sysadmin does not record the PU or, at least, not in the way you want. After giving the directions of what you need and how to measure it, you get 3 days of PU. Suppose that these three days are the latest in the series (days 18, 19 and 20). For these three days the CI interval for the average peak usage is 73.0 ±9.0% with CL=90%. Henceforth there is a 5% probability that the all-the-days peak usage average lies above 82.0%. Clearly the CI length increases and its upper limit is higher. Reason: with only three samples our knowledge about the PU behavior is very poor and the safety margin grows.

Summarizing:

LoVar system PU. 20 days. CL=90%. CI: 70.5 ±1.5%. UL: 72%.
LoVar system PU. 3 days. CL=90%. CI: 73.0 ±9.0%. UL: 82%.

Suppose you have the PU daily measurements for another, more erratic system, called HiVar. See figure 2.

Fig. 2: Peak usage for the high variability (HiVar) case. Sample size=20, sample average=68.0%, sample standard deviation=10.5%.

The CI interval for the average peak usage is 68.0 ±4.0% for a CL of 90%. And if we take only the last three days, like in the previous case, the CI interval for the average peak usage is 68.0 ±18.0% for a CL of 90%.

Summarizing:

HiVar system PU. 20 days. CL=90%. CI: 68.0 ±4.0%. UL: 72%.
HiVar system PU. 3 days. CL=90%. CI: 68.0 ±18.0%. UL: 86%.

Recommended Sample Size

Based on the CI framework we can recommend what the sample size -the number of measured days- should be if you want to keep the uncertainty / imprecision below some threshold. For the particular and reasonable case of CI length under 10% for a CL=90% we find

If the data values show low variability (sample standard deviation around 5%) you must measure at least 5 days.
If the data values show medium variability (sample standard deviation around10%) you must measure at least 13 days.
If the data values show high variability (sample standard deviation around 20%) you must measure at least 46 days!! In such a case we should relax the precision objetive and allow for a wider CI.

We can extend and summarize the above results in the following table:

	Variability
Length	Low	Medium	High
5	13	46	176
10	5	13	46
20	3	5	13

Table 1: Minimum sample size as a function of the CI length and the sample variability (Low is SD=5%, Medium is SD=10% and High is SD=20%), for CL=90%. SD stands for standard deviation.

Or express them in the opposite way:

	Variability
size	Low	Medium	High
5	10	19	38
10	6	12	23
20	4	8	15

Table 2: CI length as a function of the sample size and the sample variability (Low is SD=5%, Medium is SD=10% and High is SD=20%), for CL=90%.

Final Words

You must have a sufficient number of data points. In the sizing world 5 data points should be the absolute minimum.
If you don’t have the luxury of large samples, refrain from making bold predictions and use the CI framework to evaluate plausible expectations.

Notes

(1) T.INV() is the Student probability distribution, STDEV() the standard deviation, and SQRT() the square root.

(2) The value we would have obtained if we had used all the days, instead of a sample (subset) of them.

dilluns, 14 de juliol del 2014

The importance of what it is not said

Scenario #1: the magic coin

Someone flips a coin four times and obtain the following values:

{Head, Tail, Head, Head}

Based on the outcome of the above experiment he reports:

My coin is magic: it has a 3/4 probability of Heads, and 1/4 of Tails.

Scenario #2: the voting visionary

Next weekend election will be held in your country. There are two candidates: A, and B. Someone interviews ten people and obtain the following voting intentions:

{A, B, B, B, A, A, A, A, A, B}

Based on the above he reports:

The result of the election will be A: 60%, B: 40%.

Scenario #3: the peak usage

You measure the system usage at the peak hour during three days. The measured values are:

{70%, 70%, 85%}

Based on the above you report:

The average system usage at the peak hour is 75%.

Think

Surely you consider that the man in the coin and the voting scenarios is silly, or utterly ignorant at least, because is making bold predictions based on a very small empirical evidence. However you probably accept with no objections what is telling the man (btw, you!) in the system usage scenario, but…

…THINK!

There is nothing significantly different in the three scenarios. So if there is something wrong with the coin and the voting case, it may be the same for the peak usage case.

The confidence interval

When you try to estimate a certain quantity based on a limited number of measurements the reported value is affected of what is called the statistical error. This is inherent to any product of sampling. This statistical error is not an “error” in the common sense of the word, but instead it express the precision or reliability of the reported figure. Instead of having a value, you have an interval, that is, you MUST say:

The value is likely to be between this and that.

The length of this interval corresponds to the imprecision / uncertainty. If the length is small, the precision is high. If the length is large, the precision is low. Clearly it is not the same to say

The head probability is likely to be between 0.2 and 0.8.

a very low precision determination, than to say

The head probability is likely to be between 0.499 and 0.501.

a much higher precision one.

The confidence interval is the official name, in maths or statistics, for such an interval.

The degree of “likelihood” (…is likely to be…) is quantified by what is known as confidence level. Typical values are 90% or 95%. The loose meaning of a 90% confidence level is is that there is a 90% probability that the true value –the one we are trying to determine- lies within the interval (and, consequently, that there is a 10% probability that it falls outside).

The confidence interval is centered around the sample mean, the arithmetic average of the data (the measured values):

Center of the confidence interval = AVERAGE(data)

where data is the sample, or set of measured data points.

The confidence interval length is typically estimated with the following formula:

Length of the confidence interval =2* T.INV.2T(1-conf, size-1)*STDEV.S(data)/SQRT(size)

where size is the sample size (number of data points), and conf is the desired confidence level, and T.INV.2T(), STDEV.S() and SQRT() are worksheet (Excel) functions. (I’m not focusing here in the details, I’m paying more attention to its consequences and dependencies than to the formula itself).

This length depends on the following factors:

the sample size or number of data points: the length decreases, and the precision increases, when the number of data points increases.
the sample variability/dispersion: the length increases, and the precision decreases, when the data is noisy, erratic, highly variable.
The confidence level: the lenght increases when the confidence level increases. Reason? If you want increased certainty in your report, more “safety” margin is needed.

To have a better idea on this, look at the following table showing the approximate confidence interval length versus the sample size for a variable that can go from 0 to 100, and for a confidence level of 90%. To achieve a precision of 3.5% you need around 1000 measurements!

Size	Length
5	64
10	38
20	25
30	20
100	11
1000	3.5

Let’s go back and revisit our scenarios, but now equipped with the above ideas and guidelines.

Scenario #1: the magic coin (REVISITED)

Someone flips a coin four times and obtain the following values:

{Head, Tail, Head, Head}

Based on the outcome of the above experiment it MUST be reported:

My coin has a head probability between 0.3 and 1 with a confidence level of 90%

If you want to increase the precision, that is, reduce the confidence interval length, you must increase the sample size, that is, the number of flips. With 100 flips you are going to obtain something like:

My coin has a probability of heads between 0.45 and 0.55 with a confidence level of 90%

Scenario #2: the voting visionary (REVISITED)

Next weekend election will be held in your country. There are two candidates: A, and B. Someone interviews ten people and obtain the following voting intentions:

{A, B, B, B, A, A, A, A, A, B}

Based on the above it MUST be reported:

The result of the election with a confidence level of 95% will be:

A between 30% and 90% and B between 10% and 70%.

This has changed a lot from the initial bold prediction; it is much blurred now. Typical opinion polls to estimate the true percentage of vote with reasonable precision and a confidence level of 95% require a sample size of around 1000 people. Have a look on the small letter next to the results when you see such a study in your newspaper.

Scenario #3: the usage peak (REVISITED)

You measure the system usage at the peak hour during three days. The measured values are:

{70%, 70%, 85%}

Based on the above you MUST report:

The average system usage at the peak hour is between 65% and 85% with a confidence level of 90%.

If you want to increase the precision, that is, reduce the confidence interval length, you must increase the sample size, that is, the number of dayly measurement. With 20 days you are going to obtain something like:

The average system usage at the peak hour is between 74% and 78% with a confidence level of 90%.

Things to remember

There is an unavoidable uncertainty in your measurements and calculations.
Any estimation based on sampled data or limited number of measurements must/should be accompanied with the precision/uncertainty
Avoid too small samples. The sample size should be large enough to obtain a reasonable precision.
What it is not said, the error, is usually as important as the reported value itself.
There are many marketing tricks that do not tell the whole story and deliberately hide the estimation error. Do not act like a malicious or ignorant people creating those marketing messages.

In the next contribution I’ll have a closer look to the system usage case. Stay tuned.