|The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing / April 6, 2007|
Newport is right down the road from Jamestown. Founded in 1639, it soon grew to become the most important port in colonial Rhode Island. Ben Franklin's brother, James, was a printer here in 1727.
During the 17th and 18th centuries, so many pirates used Newport as their base of operations that the London Board of Trade made an official complaint to the English government. In the 1720s the colonials arrested many pirates who were subsequently hanged in Newport and buried on Goat Island.
Which is where I am right now, attending the 21st annual High Performance Computing and Communications conference (HPCC) at the Hyatt Regency (on Goat Island). It's been misty here for the past two days, and it's not quite spring up this way yet.
The conference is small, especially by SC standards, but that size makes the presentations seem more like conversations, and a lot of ideas are traded in the hallways among colleagues who have been coming to this conference for many years.
Today's sessions -- six presenters, plus a five-person panel -- covered a wide range of topics, but several themes kept reappearing throughout each of the talks.
The overriding theme today was scale. There was lots of discussion today about things big -- big machines, big memory, big storage. Petaflops systems were the order of the day, with mention of exaflops systems, and even one mention of zettaflops.
Several speakers highlighted the problems inherent in writing software for systems comprised of thousands of manycore processors. Power was never far from the discussion. Other speakers talked about the reliability issues of systems this large. Problems that aren't very likely to happen start to occur with some frequency in systems performing on the order of 10^20 or more operations per year. And don't forget that these large systems are themselves comprised of many other cooperating systems of components. It's a potential mess.
A secondary but still pervasive theme today was application software. Some speakers talked about the growing trend for domain requirements to drive the design of the largest new systems coming on line (like those in the NSF Cyberinfrastructure program and in the DOE). Others talked about the enormous difficulties we face in bridging the gap between the "bare metal" performance monsters the HPC community is going to build for us and the users who have to solve real problems using someone's software. The software discussions covered topics ranging from productivity to the need for a real discipline to assess (and deliver) software quality and readiness for scale.
Hybrid computing also surfaced, with friends and foes alike. Supporters talked a lot about the potential performance improvements and the ability of hybrid systems (which aren't new: anyone remember FPS, among many others?) to create a performance "spanning set" where one type of processor would complement another's weakness. The hybrid computing skeptics talked about difficulty of programming and lack of expressive models, the need to measure and report "real" performance (performance including the time to get the problem to supporting processes and then to get the answer back out), and the very real power problems some of the solutions present.
All of today's speakers did an excellent job, and this is definitely a conference I'll be coming back to. In the interest of brevity, I'll focus on three talks that I found particularly interesting (as always, selected totally subjectively) with no disrespect meant for anyone left off.
Stephen Wheat, Intel
Stephen Wheat from Intel was up first. It was his fifth year to speak at this conference, and his talk focused on what Intel is doing in HPC, and why.
First, the why. As Dr. Wheat layed out, Intel has 90 percent of the enterprise chip market and 60 percent of its revenue. This leaves the last 40 percent of the revenue in that tiny, but lucrative 10 percent of the market. HPC is the fastest growing segment of that 10 percent, so it's not hard to make the leap to see Intel's interest in HPC.
What struck me about Intel's approach in capturing that last 10 percent is their "ecosystem" approach to HPC. They aren't just looking at FLOPS, as you might expect. They are investing significant resources in software (10 percent of Intel is devoted to software), power, the memory wall, communications, and reliability as they look at creating balanced, affordable systems at the petascale level and beyond.
Stephen also talked about the roadmap and the two-year process technology cadence, with 45nm appearing this year, 32nm in 2009, 22nm in 2011, and all the way down to 8nm in 2017. The process technology has the first year of a two-year couplet, shrinking the preceding year's architecture, with the second year used to introduce a new microarchitecture. It's Penryn first, then Nehalem, followed by Westmere and Gesher in the 2009 couplet. God help me, it's starting to make sense.
Stephen also talked about the research that Intel is doing on the tension between a few complex cores, many simple cores, and a hybrid between the two. In this context he talked about getting performance out of these systems by stacking improvements such as adding cores, using hardware thread scheduling, improving cache, and then adapting the instruction set. Along the way Stephen touched on one of Intel's design targets -- deliver all this improvement without breaking the current programming models. A tough challenge.
It was a complete picture, and this was the first time I'd seen it all presented at once.
Andrew White, Los Alamos
Los Alamos was also up today, talking about their new Cell-based system and their hope for hybrid computing. Of interest to me were the changes they've made to the design of the systems they are building, particularly with regard to more tightly coupling the Cell cadre to the base Opteron processors, deciding to go with PCI-E instead of InfiniBand.
Andy also talked about performance. He cited speedups between 5x and 10x for various computational problems on the existing Cell blades (14-16 gigaflops), based on the relatively low performance double-precision implementation in the current crop of Cell processors. The next generation of enhanced double-precision Cell processors are expected to increase that to 100 gigaflops. But I'm still worried about load/unload time for shuttling work and results to and from the processor.
The resulting system will still be quite complicated to program, with developers having to worry about managing workloads across the Opterons, the PowerPC on the Cell, and the Cell's processing engines. IBM is evidently doing quite a bit of work in creating libraries to manage work scheduling and communication among the layers, but this work is still in progress and these things usually take quite a while to stabilize.
An interesting side note related to reliability: Andy made the observation that modern transistors have a gate oxide layer that's about 6 atoms thick. When even single atom defects can cause the layer's thickness to vary by almost 20 percent, there isn't much room for errors in the production process.
Doug Kothe, Oak Ridge National Lab
Doug talked about ORNL's Leadership Computing initiatives and the technology roadmap for the lab's HPC component.
One of the things I found particularly interesting about his talk was the degree to which Oak Ridge has formalized the process around how applications requirements shape (at least in theory) the directions that the computer technology acquisitions go. For example, understanding that some percentage of your user base works with unstructured meshes means that those problems are heavily dependent upon the bandwidth of the processor interconnect. This is a process I'd like learn more about.
He also talked about the need to assess application readiness for those codes contending for allocations on leadership computing systems. Here readiness is assessed not only by a code's ability to run, but also by its ability to run effectively on the platform at a scale appropriate to the machine. In this same vein he also talked about the need for application designers to step into the gap left by the vendor community and design in defensive checkpointing and fault tolerance to deal with petascale reliability issues.
I spoke briefly with Jose Munoz (Deputy Director, Office of Cyberinfrastructure at the NSF) in the break between sessions, and asked him what he thought was most important about the presentations he had seen. His take on the morning was that he was quite glad to see the degree to which the presenters had focused on applications and science driving HPC requirements, rather than a focus on raw machine performance. This was a theme echoed in his own talk about the ways in which the Cyberinfrastructure program focuses its resources on the NSF's science mission.
The day wrapped up with a panel called "Supercomputing: Over the Horizon," chaired by James Kasdorf of PSC. The panelists were Robert Graybill (Council on Competitiveness and USC), Sangtae Kim (Purdue), Douglass Post (DoD HPC Modernization Program), Karl Schulz (TACC, U Texas), and William Thigpen (NASA).
Each of the panelists shared their view of what they see as important issues, programs and initiatives (or barriers) to supercomputing as we move forward.
Bob Graybill articulated the importance of HPC as a cornerstone of national competitiveness (a theme of the Council) and stressed the need for HPC resources to become more usable and more accessible such that they could become a part of the workaday routine of most of America's knowledge workers. This is, in fact, the goal of the National Innovation Collaboration Ecosystem (NICE) effort that Bob is heading up.
Sangtae Kim talked about the forces that drove big pharma out of HPC, the main reason being that Moore's law led to desktop power sufficient for their fixed size problem. He considered whether this set of users should return to HPC. Dr. Kim concluded that they will and should, noting that HPC can play a large role both in "molecule wriggling" and data mining to understand things like why small portions of the population have negative side effects to certain drugs. This has obvious benefits to patients (fewer deaths) and to the companies (fewer lawsuits).
Doug Post talked about CREATE, the $360 million software development effort the DoD is pursuing to integrate HPC tools into the workflow of the department's large systems acquisitions in ships, antennas, and aircraft. He noted that the combination of complex science, complex organizations, insufficient development tools and methodologies, and complex hardware makes big software very hard to design. He noted that the problem will get worse as machines go to manycore and heterogeneous processing. CREATE is expected to be a 10-year effort, and will include a hierarchy of models that span the range from the workstation applications to HPC-class codes. The effort has its work cut out for it. Post cited a study that followed the development of six large DOE codes in which two failed, two were late, and two were on time. All were from an organization with 50 years of software development expertise.
Karl Schulz talked about TACC's enormous NSF Cyberinfrastructure Track-2 system, the 500-and-some-odd teraflops Sun system set to enter early access in December of this year. In particular he talked about the challenges that the user community is going to face in moving to a single system that, by itself, represents twice the power in the installed base of the entire Teragrid.
Bill Thigpen brought an interesting perspective from his position in the management chain of NASA's Columbia system. He talked about the benefits that HPC brings the agency. He cited the example of the ability to analyze images taken during a shuttle launch and assess damage and mitigation opportunities in a day rather than months. Bill also talked about some of the challenges HPC users face. Among these is the lack of effective development tools and techniques that allow users to "future proof" their applications. The goal is to preserve portability without forsaking the machine-dependent performance features that prompted the purchase in the first place.