dilluns, 20 d’octubre del 2014

SAPS Olympics

As it’s clearly stated in the official site (http://sap.com/benchmark), the SAPS (SAP Application Performance Standard) is a hardware-independent unit of measurement that describes the performance of a system configuration in the SAP environment. Almost everyone in the computer system performance arena is aware of the SAPS, and indeed the SAPS has become a de facto standard measure of performance.
As I browse the SAPS list sometimes, I’ve decided that I’m putting the hat of objective journalist here and predigest the list for you. I don’t pretend to teach or comment anything else, at least by now. After reading this you may draw conclusions by yourself.
There are 774 official SAPS results posted at the public official site (SD 2-Tier) http://global.sap.com/solutions/benchmark/sd2tier.epx. The first one dates back to 1996, a Siemens machine. The last benchmark corresponds to a HP ProLiant DL380 Gen9, with certification number 2014037, published last October 17th.
For this analysis to be more relevant to you I have exclusively considered systems that have been measured with the latest benchmark specification: SAP Enhancement Package 5 for ERP 6.0 (SAP has not updated the benchmark environment to any newer versions EHP6 or EHP7). There are 75 such systems (as of Oct 20th 2014), and the first one is a HP ProLiant BL465c Gen8, with certification number 2012026, published Jun 4th 2012. So we are analyzing a little bit more than the last two years.
 

SAP Technology Partners

Which SAP Technology Partners (a SAP concept) have actively been benchmarking SAPS? Here are the results: Dell: 14, HP: 14, IBM: 14, Cisco Systems: 13, Fujitsu: 8, NEC: 4, Hitachi: 3, Oracle: 3, Stratus Technologies: 1, and Unisys: 1.
image
 

By CPU family

Count by CPU family type:  Intel Xeon: 58, IBM POWER: 9, AMD Opteron: 3, Oracle SPARC: 3, and Fujitsu SPARC64: 2.
image
 

By Operating System

Count by Operating System:  Windows Server: 41, Red Hat Enterprise Linux: 17, SUSE Linux Enterprise Server: 5, IBM AIX: 7 and Solaris 11: 5.
image
 

By Relational Database Management System

Count by the Relational Database Management System? SQL Server: 36, SAP/Sybase ASE: 21, IBM DB2: 13, Oracle: 5.
image
 

Absolute Number Of SAPS (SAPS per System)

Gold: Fujitsu SPARC M10-4S 40 chips SPARC64 X @3000 MHz, (640 cores, 1280 threads): 844420 SAPS.
Silver: Oracle SPARC M6-32 32 chips SPARC M6 @3600 MHz (384 cores, 3072 threads): 793930 SAPS.
Bronze: Oracle SPARC Server M5-32 32 chips SPARCM5 @3600 MHz (192 cores, 1536 threads): 472600 SAPS.
The four first positions are held by Oracle and Fujitsu SPARC servers.
The last (which kind of medal does correspond to this?): Cisco UCS C240 M3 2 Intel Xeon Processor E5-2640 (12 cores, 24 threads) 24600 SAPS.
 
image
 

SAPS per core

Gold: IBM Power Enterprise System E870 with 8 chips POWER8 @4190 MHz (80 cores, 640 threads): 5451 SAPS/core.
Silver: IBM Power System S824 with 4 chips POWER8 @3520 MHz (24 cores, 192 threads): 4828 SAPS/core.
Bronze: IBM Flex System p260 Compute Node with 2 chips IBM POWER7+ @4100 MHz (16 cores, 64 threads): 3419 SAPS/core.
The nine first positions are held by IBM POWER servers. They are the absolute winners in this category!
Lowest: HP ProLiant BL465c Gen8 2  chips AMD Opteron 6278 @2400 MHz (32 cores, 32 threads): 864 SAPS/core.
image
 

SAPS per thread

Gold: Dell PowerEdge R730 with 2 Intel Xeon Processor E5-2699 v3 @2300 MHz (36 cores, 72 threads): 1252 SAPS/thread.
Silver: HP ProLiant DL380 Gen9 with 2 Intel Xeon Processor E5-2699 v3 @2300 MHz (36 cores, 72 threads): 1221 SAPS/thread.
Bronze: Dell PowerEdge R730 with 2 Intel Xeon Processor E5-2699 v3 @2300 MHz (36 cores, 72 threads): 1217 SAPS/thread.
The seven first positions are held by servers with Intel Xeon Processor E5-2699 v3.
Lowest: Oracle SPARC Server T5-8 8 chips SPARC T5 @3600 MHz (128 cores, 1024 threads) 278 SAPS/thread
image

dimecres, 15 d’octubre del 2014

The Effect of IO Improvement: Bandwidth



Recently I’ve had the opportunity to deal with database systems that claim to have a very smart storage subsystem that makes them superior every time, everywhere. I’m going to analyze this from the point of view of the performance: what is the actual effect of this smartness?

A visit to the doctor


Typically a visit to the doctor consists of two differentiated stages: the interaction with the nurse (when arriving check if you’re expected to come, some admin stuff...), and with the doctor himself.  Let’s say you spend 2 minutes with the nurse, and 10 minutes with the doctor. That means the nurse is capable of attending 30 patients per hour, and the doctor 10 patients per hour. Technically it can be said that the nurse bandwidth is 30 pph (patients per hour) and the doctor bandwidth is 6 pph. 

Those are the performance numbers when the stages are considered individually. But what is the overall bandwidth of the medical center (nurse + doctor) system? The one of the nurse? The one of the doctor? Somewhere in between? If you think a little bit the conclusion must be crystal clear: the doctor’s, and just because his bandwidth, his capability to service patients, is the lesser of the two. You may try to inject in the medical center more than 6 pph, but they will proceed through the system at a maximum 6 pph rate, the medical center bandwidth. The doctor is the bottleneck stage of the medical center.

If you were the managing director and decided it was necessary to increase the “performance” of the hospital, what would you do? Employ smarter or more nurses, or employ smarter or more doctors? 

If you change the current nurse and employ a new one that is twice as efficient, that is, reduces from 2 min to 1 min the time spent with each patient, you would have increased the bandwidth of the nurse stage from 30 pph to 60 pph. But the doctor will continue being the bottleneck. The system bandwidth will remain the same! Conclusion: the nurse stage improvement is not effective.

If you change the current doctor and employ a new one that is twice as efficient, that is, reduces from 10 min to 5 min the time spent with each patient, you would have increased the bandwidth of the doctor stage from 6 pph to 12 pph. The doctor will continue being the bottleneck, but at an increased rate of 12 pph. Perfect! This is the right way to increase the medical center bandwidth, which has gone up from previous 6 pph to new 12 pph. The doctor (bottleneck) improvement is fully effective. 

A very important “rule” you should keep in gold letters: the only effective way to improve the bandwidth of a composite (several stages) system is to improve the bandwidth of the bottleneck stage.

It is worth to point out that which of the stages is the bottleneck also depends on the workload: the doctor is the bottleneck for the above workload, a normal patient visit. But imagine a new workload that consists of heavy administrative work that must be processed by the nurse and the doctor only has to stamp his signature. What do you think would be the bottleneck then?

Note: why doctors and medical centers? No particular reason beyond I’ve been today at one of them and it has occurred to me they are a very simple scenario everyone would have experienced.

A visit to a database machine

For the purpose of this simple analysis the database machine is also a two stage (or subsystem) system: the database node and the storage node. One user request, or visit, to the system places demands for service on the database and on the storage.

For a particular workload the database and storage nodes have a certain processing capacity. The (maximum) processing capacity is called bandwidth, and is measured in user requests per unit time, typically requests per second (req/s) or per minute (req/min).

The stage with the lesser of the two bandwidths -for this particular workload- is the bottleneck, and the system bandwidth will be equal to the bottleneck bandwidth, no more, no less. 

Consider a certain heavy IO workload, with high degree of IO contents, for which the storage is the bottleneck. Any improvement on the IO demand, being achieved by a reduction of IOs caused by smart intelligence or by using more or faster storage nodes, will increase the bandwidth of the storage stage. And, consequently, the bandwidth of the overall system / database machine. 

Let’s consider these particular values for the heavy IO workload: the bandwidth of the database node (BDB) is BDB=10 req/s (or 600 req/min), and the bandwidth of the storage node (BST) is BST=5 req/s (or 300 req/min). The system bandwidth (B) is equal to the one of the storage (bottleneck), that is, B= BST=5 req/s. If the IO reduction technique halves the IO demand, the improved storage bandwidth will be B’ST=10 req/s (or 600 req/min). The database machine bandwidth will increase from 5 req/s to 10 req/s. That is B’= B’ST=10 req/s.

Let’s explore the other extreme: a light IO workload, for with the database is the bottleneck (and the storage is not). Any reduction of the demand placed on the storage subsystem will have null, or very limited, effect as it is not improving the bottleneck. In this case to have or not a smarter storage doesn’t matter at all. You better wouldn’t pay for it.

Let’s use this particular values for the light IO workload: BDB=5 req/s (or 300 req/min) and BST=10 req/s (or 600 req/min). The system bandwidth (B) is equal to the one of the database (bottleneck), that is, B= BDB=5 req/s. If the IO reduction technique halves the IO demand, the improved storage bandwidth will be B’ST=20 req/s (or 1200 req/min). But the overall database machine bandwidth will remain the same: B’= BDB=5 req/s.

This light IO scenario is exactly what happens in benchmarks that have a very low IO content, like the SAP SD benchmark that meters the SAPS of the systems. It doesn’t stress the IO subsystem and B is limited by BDB, no matter what extraordinary capable or smart storage you use.

Let’s point out here that the bandwidth of the overall system will never be greater than the bandwidth of the database node. This particularly means that the SAPS of the database machine will be always equal or less than the SAPS of the database node.


In the real world

Back in the real world there are more tones that merely black or white. But to know the extremes should be of help to us.

Any IO improvement, for example with smarter storage, has its maximum effect when the storage is the bottleneck. It has no effect at all when the storage is not the bottleneck (and the database node is). Consequently from the bandwidth point of view there is no such an every time, everywhere absolute positive improvement for having a smarter storage. It will depend on where the bottleneck is placed. 

But what it seems clear is that this smartness will cost you a positive absolute number.