The Portland Group
Oakridge Top Right

Since 1986 - Covering the Fastest Computers
in the World and the People Who Run Them

Language Flags

Visit additional Tabor Communication Publications

Enterprise Tech
HPCwire Japan

Blog: From the Editor

From the Editor | Main Blog Index

Revisiting the Memory Wall

As the first Intel Nehalem EP server chips get ready for their debut, HPC users in particular are anxious to get a taste of Intel's new high performance design. The new architecture incorporates the QuickPath Interconnect (QPI) and integrated memory controllers, a setup which should be especially kind to memory-intensive applications.

As I've discussed before, the "memory wall" has become one of the most worrisome issues in HPC. Over the last two decades, memory performance has been steadily losing ground to CPU performance. From the 1986 to 2000, CPU speed improved at an annual rate of 55 percent, while memory speed only improved at 10 percent. As clock speeds stalled, chipmakers resorted to multiple cores, but if anything, that only caused the CPU-memory performance gap to widen. A recent study by Sandia pointed to the futility of just throwing more cores at the problem.

The Nehalem processors, though, should provide some relief -- if temporarily. The soon-to-be-released quad-core EP chips for two-socket servers will have integrated DDR3 memory controllers, which Intel claims will bump memory bandwidth by 300-400 percent compared to the current "Penryn" class Xeon processors. Exact performance has not been verified, but the new DDR3 controllers should yield memory bandwidth in the range of 32-35 GB/second per socket. That should be a big lift for many memory-bound applications.

Unfortunately, after Nehalem, Intel probably won't be able to duplicate another memory performance increase of similar magnitude for some time. DDR4 will have perhaps twice the raw performance of DDR3, but is not expected to show up until 2012. More exotic memory architectures are on the drawing boards, but no manufacturers have committed to a roadmap.

GPUs are a different story though. These chips are all about data parallelism, so the memory architecture was designed for parallel throughput from the get-go. For GPGPU computing platforms like NVIDIA Tesla and AMD FireStream, the hardware comes with a hefty amount of very fast memory so that large chunks of computations can take place locally, without having to continually tap into system memory.

Today, you can get an NVIDIA Tesla GPU with 4 GB of (GDDR3) memory at 102 GB/second of bandwidth. Granted this is graphics memory, so you have to deal with the lack of error correction, but at roughly three times the memory performance available to a Nehalem processor, GPUs can offer some respite from the memory wall. The more favorable GPU-memory performance balance is one reason why users have been able to speed up their data parallel apps by one or two orders of magnitude.

And yet the entry of Nehalem into the HPC server market is bound to be the big story this year. Despite the meteoric rise of GPUs in the general-purpose computing world over the last couple of years, most HPC users are still using x86-based clusters. According to IDC, less than 10 percent of the HPC user sites they surveyed were using alternative processors (most of which, I assume, were GPUs and Cell processors), and they didn't see those numbers changing dramatically in the near term.

But the memory wall will be unrelenting. The eight-core Nehalem EX chip is in the works and is expected to show up in the second half of 2009. At eight cores, memory-intensive apps might be a poor fit for this platform. It was at the eight-core mark that the Sandia study saw an actual decrease in performance. There's plenty of anecdotal evidence that a variety of HPC applications are seeing declining application performance as they migrate from just two to four cores.

On top of that, considering the onerous software licensing model for multicore processors used by many ISVs and the well-known difficulties of multi-threaded software development, multicore CPUs may not be the path to HPC nirvana after all. Thinking optimistically, though, it's quite possible we'll find a path around the memory wall and all the other parallel computing roadblocks. But the solution is likely to come about by looking at the problem in an unconventional way.

As IT publisher Tim O'Reilly recently wrote: "The future isn't going to be like the past. What's more, it isn't going to be like any future we imagine. How wonderful that is, if only we are prepared to accept it."

Posted by Michael Feldman - February 19, 2009 @ 5:43 PM, Pacific Standard Time

Michael Feldman

Michael Feldman

Michael Feldman is the editor of HPCwire.

More Michael Feldman

Recent Comments

No Recent Blog Comments

Sponsored Whitepapers

Breaking I/O Bottlenecks

10/30/2013 | Cray, DDN, Mellanox, NetApp, ScaleMP, Supermicro, Xyratex | Creating data is easy… the challenge is getting it to the right place to make use of it. This paper discusses fresh solutions that can directly increase I/O efficiency, and the applications of these solutions to current, and new technology infrastructures.

A New Ultra-Dense Hyper-Scale x86 Server Design

10/01/2013 | IBM | A new trend is developing in the HPC space that is also affecting enterprise computing productivity with the arrival of “ultra-dense” hyper-scale servers.

Sponsored Multimedia

Xyratex, presents ClusterStor at the Vendor Showdown at ISC13

Ken Claffey, SVP and General Manager at Xyratex, presents ClusterStor at the Vendor Showdown at ISC13 in Leipzig, Germany.

HPCwire Live! Atlanta's Big Data Kick Off Week Meets HPC

Join HPCwire Editor Nicole Hemsoth and Dr. David Bader from Georgia Tech as they take center stage on opening night at Atlanta's first Big Data Kick Off Week, filmed in front of a live audience. Nicole and David look at the evolution of HPC, today's big data challenges, discuss real world solutions, and reveal their predictions. Exactly what does the future holds for HPC?