INTEL WON BACK most of the lead that AMD had earned in the server market with the Conroe and Penryn generation of CPUs, and erased the rest with the Nehalem generation. Today marks the release of Westmere-EP, the 6 core 32nm update to Nehalem, and the Intel lead just gets wider.
Billions of transistors are pretty
With the K8/Opteron, AMD signaled that it was serious about the server market, and proceeded to take large chunks of the market from a rather helpless Intel. The responses, P4 based chips that soon hit speed and power walls, were quite insufficient. When AMD launched dual cores, Intel responded with some very lackluster dual P4s.
AMD was on a roll – integrated memory controllers, 64-bit instructions, and Hypertransport, all in one package. It was unstoppable until Intel released the Conroe based chips, both dual and market changing quad MCM parts. With these, Intel shed several bad habits, and returned to making efficient CPUs rather than chasing nebulous numbers. With Penryn, the 45nm successor to Conroe, Intel had the lead on almost every server test.
Part of this was due to AMD’s utter failure to execute on Barcelona, its first native quad core. To say it was late and underwhelming is giving it more than it is due. AMD learned a lot, internal changes in engineering and personnel lead to a vastly improved Shanghai quad, and it quickly followed up with an Istanbul 6-core. Those kept up fairly well with Intel’s Penryn generation, but were released just when Intel was moving on.
A 6 core Westmere-EP die
The next generation Nehalem was a completely new architecture, and it added integrated memory controllers, new and improved threading support, and just about everything that Penryn lacked. AMD had almost no workloads that it could claim a win on at that point. Nehalem was a huge step up in just about every workload imaginable, and several Intel people claimed it was the largest single leap the company had ever made. Most tests back this claim up.
Westmere-EP is an optimization and shrink to 32nm of the Nehalem cores. It first came out in two core guise in Arrandale and Clarkdale forms, chips aimed at the notebook and low end PC market. The server versions, officially called the Xeon 5600 series, are all either four or six cores, and all support two sockets.
The launch today is the highest end part, but as is normal with a shrink, it is slightly smaller than it’s predecessor. Nehalem-EP chips had 731 million transitsors on a 263mm^2 die. Westmere-EP bumps the count to 1.17 billion transistors, adds two cores and some additional features, but still manages to shrink the die to 248mm^2.
The family tree
The new Westmere-EP chips take over from the Nehalem-EP based Xeons on the top end of the market, the so called ‘Advanced’ and ‘Standard’ market segments. The bottom of the market, the ‘Basic’ segment does not get Westmeres yet, but it does get a 133MHz speed bump. There are three prefixes for Westmere CPUs, X, E and L, each carrying a power band with it. Es are all four core only, and occupy the mainstream 80W power band. Ls are 40W for 4 cores, 60W for 6, and the X line is for the 95W and 130W power bands.
The most interesting parts are the X5667 and X5677, both 4 cores but running at notably higher speeds than their 6 core counterparts. Clock speed for cores, the new performance tradeoff. If your app is not well threaded, these are the parts for you. On top of these extra MHz, turbo adds a per-core bump when applicable, perfect for single threaded apps. Most servers run lots of threads, so the niche for these 56×7 parts is small, but the few who need it really need it.
The net takeaway on the new Westmere line is that they are socket compatible with the older Nehalem-EP chips and are priced a few percent higher than an equivalent older model. Calls to distributors say that the new parts cost the same as older ones from their end, so this is likely a play to make Xeons more attractive to the channel.
Pricing and features pic
At the minimum, Westmeres have a slight speed bump, and many bring two more cores than the parts they replace. Intel is claiming 30 percent lower power or 40 percent more performance for the same power depending on which models you pick. With socket compatibility, the new chips are a clear win.
In addition to the power and performance, the Westmere architecture brings several new functions to the table in a somewhat spotty manner. Features are fused off rather randomly across the line, so if you are are interested in a particular line item, it is best to dig deeply before you buy. Even things as basic as instruction sets are removed on some models. Intel is really adding pain to the buying process for no good reason.
If you want ‘Feature X’ that is supported on a particular generation of AMD Opterons, it is there, period. On an Intel Xeon, you are sent scurrying across various web pages to hunt down information on even the most basic functions. Making Microsoft’s SKUs look rational is not a way to endear yourself to customers.
The most welcome addition is platform based, support for low power DIMMs. Samsung announced its support for the spec a few weeks ago and Kingston jumped on board today. While it may not seem like a big deal, saving a Watt here and there, multiplied by the DIMM count on a modern server, adds up quickly. If you are running a data center, power equals cost, and large customers drool over things like this. It will save users lots of money.
A related change is that more DIMMs are supported per channel. Nehalem-EP would only support 1 DDR3/1333 DIMM per channel, two at 1066, and three at 800. Westmere-EP bumps this up by one to 2 per channel at 1333, and likely three at 1066. Sadly, Intel’s brilliant market segmentation schemes artificially lock the Standard and Basic chips, IE anything with an E prefix, out of DDR3/1333. Worse yet, the Basic parts can only use DDR3/800. It seems like a petty way to artificially force upgrades.
Moving on to the core itself, there are some big changes in the instruction set (ISA) and supporting mechanisms. The biggest one is the addition of the AES-NI ISA, six instructions meant to boost encryption and decryption speed. This will be a huge gain for anyone using full disk encryption, something that is quickly becoming a legal mandate for large corporations.
Web servers, net based financial transactions, and almost anything that needs protection can be made to work better and faster with AES-NI, but right now, the apps with support are a bit limited. Better still, the things that use AES-NI not only run faster, but usually use less power too, a boon for laptop anti-theft related crypto tools.
The sad thing is that on some Westmere models, Intel is fusing off the AES-NI instruction set to artificially segment the market. Once again, this fragments the user base, slows adoption, and in general, makes a mess of things for no real gain. Nothing excludes users like fusing off instructions. Does anyone think SSE would have become widespread if it needed a $100 more expensive CPU?
One other instruction added to the mix almost slides off the tongue, it is called PCLMULQDQ or pickle-mickle-duck, a carryless multiply. If you care about things like optimizing CRC algorithms, you can read up on PCLMULQDQ here. For the technical white paper averse, the take home message is that CRCs are used a lot in data protection, and they will take less time to compute on Westmere CPUs.
Another small change is that the APIC (Advanced Programmable Interrupt Controller) timers continue to run in deep sleep modes. This allows the CPU to stay in sleep modes longer, wake up faster, and in generally save power while still waking up when requested. Even though Intel’s gains here are almost universally erased by Microsoft’s ‘advances’ in BIOS programming that seems to only negatively affect Linux, this is a good thing.
Westmere-EP has two new memory related functions, 1GB pages and PCID’s. Large pages are probably not going to be used much in the near term, but their support paves the way for the future. PCIDs (Processor Context ID) are used a lot in virtualization, mostly at the OS level or the VMM. They tag entries to the TLB with a CPU identifier, and those are now preserved across CR3 writes. This is the technical way of saying that when you change VMs, the CPU tags in the TLB don’t get wiped out, and virtualization is more efficient.
Between the PCIDs and several other low level architectural changes, Intel is claiming that Westmere now has a notably lower ’round-trip latency’ over Nehalem. This means the overhead caused by the VMM goes way down, and you can flip between VMs much quicker than before. Given how virtualization is taking over the world, this is a very important advance.
When it comes to security, AES-NI is not the only addition, Intel added Trusted Exectution Technology (TXT) to the mix. TXT is Intel’s code name for a Trusted Platform Module (TPM), basically a hardware crypto chip on the motherboard or chipset. TPMs are common on enterprise oriented motherboards, and even some enthusiast boards, but are rarely used outside of large enterprises. TXT allows a PC to cryptographically verify the code it is running, from the BIOS to the bootloader, and even the OS itself.
If a virus, trojan, or even MS’s inbuilt malware gets ‘under’ the intended OS or VMM, it can effectively block the OS from detecting that anything is amiss. From there, the bad guys own the system, and there is almost nothing you can do to clean the computer without using another known good system.
TXT can verify that there is nothing in memory that should not be there, and that the OS has not been tampered with, on disk or in memory. It answers the most fundamental question in security, am I what I think I am? Without a definitive answer there, you can not have any real security, anything above that layer could be built on a compromised foundation.
Once an OS knows that it is secure, and there is nothing amiss below it, the OS can go about it’s merry way without fear of prying eyes or tampering. This does not secure the OS, just validates that nothing bad has snuck in before the OS runs. None of the experts that I asked could come up with a good reason why the TPM could not be emulated by malware, so the security arms race goes ever onward.
In the end, Intel delivered the best two-socket server CPU out there. Initial testing done by SemiAccurate shows that the 30 percent power reduction or 40 percent speed increases are very much achievable. On chips with similar core counts, Nehalem beat the AMD competition silly. But AMD’s Istanbul CPUs had more cores, at times giving them an advantage. Now with Westmere-EP, Intel takes the lead in just about everything.S|A
Note: Due to time contraints and a busy travel schedule, testing of the new Westmere CPUs was not finished in time for this article. We hope to bring them to you tomorrow.