|The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing / June 28, 2006|
Horst Simon, associate laboratory director for computing sciences at Berkeley Lab, is also the director of NERSC, the winner of a Gordon Bell prize for parallel processing research, a co-developer of the NAS Parallel Benchmarks, and one of four editors of the "TOP500" list. In conjunction with today's IDC analyst briefing at ISC2006 in Dresden, we asked Dr. Simon about the challenges of petaflop computing and got some surprising answers.
HPCwire: There's more than one definition and milestone for petaflop computing. There is a peak petaflop, a Linpack petaflop, an "embarrassingly parallel" petaflop, and finally a petaflop on a "normal" application. What can you say about this?
Simon: The peak and Linpack petaflop milestones will happen quite soon, possibly already in 2008. Then comes the sustained petaflop applications, as you said. When I talk about petaflop computing, what I have in mind is the longer-term perspective, the time when the HPC community enters the age of petascale computing. What I mean is the time when you must achieve petaflop Rmax performance to make the TOP500 list. An intriguing question is, when will this happen? If you do a straight-line extrapolation from today's TOP500 list, you come up with the year 2016. In any case, it's eight to 10 years from now, and we will have to master several challenges to reach the age of petascale computing.
HPCwire: What are the main challenges?
Simon: There are several that I see. Near-term, I'm most concerned about achieving "a petaflop before its time." Today, we're in an exciting time where everyone is positive about HPC. The American Competitiveness Initiative (ACI), DOE-Science, the NSF and others are getting nicely funded. This is based on the work the HPC community has put in since 2002, culminating in HECRTF, NAS and other reports. There's also a lot of technology excitement. Blue Gene has run an application at over 200 teraflops. There's funding for the ORNL petaflop system. NNSA procurements may lead to a petaflop system. The DARPA HPCS program is about to enter Phase III. NSF is aiming for a sustained petascale system in 2010. Not to mention the petaflops initiatives in Europe, Japan and China. All around the globe, there's excitement.
But if in all this enthusiasm we settle for just the easy goals, such as the first peak or Linpack petaflop performance, we may have a "petaflop before its time." Once the peak and Linpack milestones are achieved in 2008 or so, the real hard work begins, the work of achieving petaflop performance in production computing environments. The peak and Linpack petaflop goals are like climbing to the top of a mountain and saying, "Okay, we're ready to let the trucks roll over the summit." It's not that easy. You need to develop a lot of infrastructure first.
HPCwire: What about other major challenges?
Simon: Another big one is power consumption. For all the architectures on the horizon, if they were scaled to a sustained petaflop and beyond, they would require an enormous amount of power. Power consumption is really pushing the environment most of the computing centers have. A peak-petaflop Cray XT3 or cluster would need 8-9 megawatts for the computer alone. The 2011 HPCS sustained petaflop systems would require about 20 megawatts. A few national labs can afford this, because they're set up for big accelerators and so on; but in HPC, if there were 10 national labs wanting systems of this size, there would be hundreds of universities wanting versions that were smaller but still relatively large. How many universities can accommodate 2-megawatt systems? This may be a barrier to further adoption.
We need much more efficient power solutions. Blue Gene is better, but it still requires high megawatts. It's not just high-end computing that faces this power issue. It's the whole computer industry. The problem is both the installation and the operation cost. Assume today a 10 cents per kilowatt-hour cost. Even if the price of energy doesn't increase, that means a 20-megawatt system would cost $12 million or more a year just for electricity. This is an order-of-magnitude jump in cost compared to today. Liquid cooling addresses the space problem by allowing systems to be more compact, but it doesn't solve the power problem.
HPCwire: You mentioned that the power issue affects the entire computer industry. That reminded me of Google building its huge new computer complex.
Simon: Right. Big consumers of computer cycles, like Google and Microsoft, are building huge systems in locations where power is cheap. They're replacing aging industrial facilities with new IT facilities. The real question is, can we as a society afford for large-scale IT technology to consume electricity equivalent to large-scale manufacturing plants? I think not, with energy prices going through the roof. We might need to exploit different processor curves, such as the low-cost processors used in embedded technology. For example, the Cell processor itself fits into this category because it comes from low-end, embedded game technology. The Cell processor has potential, but it may not be the solution. While our research at Berkeley Lab has shown that Cell has great potential, there is a huge step from this initial assessment to a production solution.
HPCwire: How about challenges on the software side?
Simon: Well, related to that is the large challenge of parallelism itself. As a community, we've done 15 years of research to get comfortable with parallelism on the MPI model. This works reasonably well for up to thousands of processors, or 100,000 in the case of Blue Gene; but if you look back to the early 1980s, microprocessors had fewer transistors than we now have processors in the largest HPC systems. One must ask, isn't it crazy that we program Blue Gene with a completely explicit programming model? This is a very unsophisticated approach that's worse than in the 1980s, where we at least had assembly language. It's like programming every bit on a 1980s-era microprocessor. Who would have done that?
We need better metrics to measure how parallel systems perform, and we need at least the equivalent of assembly language for programming. The DARPA HPCS program is trying to address this with new languages. The issue there is that if new languages cause a lot of pain, people won't change. This is another issue that affects more than just HPC, because parallelism is stepping out from HPC to the general world of computing. The solution to issues like power consumption and parallelism may come from somewhere outside of the HPC community, but the HPC community needs to be in the forefront of trying to solve these problems.
HPCwire: How can the HPC community begin this task?
Simon: We must have an ecosystem point-of-view. We must change the hardware, the system software and the applications all at once, not one thing at a time. DARPA HPCS is trying to do this but it's too soon to tell if they will be given enough investment to make it happen. There is no compelling economic force yet to motivate this shift. Clusters and MPI are fine for most things. I'm concerned that the success of clusters will starve the high end, and that government high-end requirements will be increasingly disjointed from the mainstream HPC market.
HPCwire: As you mentioned earlier, more and more sites are joining the "petaflop club," not just in the U.S. but also across the world. This is costing $100-200 million or more per system. How much of this is driven by scientific requirements, and how much by the need for nations and supercomputing sites not to be left behind in the race?
Simon: There are definitely some strong scientific drivers, although I wish these initiatives would be even more science-driven. We as a community are focusing first on the hardware, then a little on system software. We can't leave the applications behind as we design innovative hardware. The danger is that funding real production systems based on HPCS may be seen as too expensive and too power-consuming.
HPCwire: Which scientific disciplines and applications are likely to benefit first from petascale systems?
Simon: One application in the general scientific world that really needs to be pushed forward is climate modeling. This is very crucial not just for the scientific community, but for the world community. Based on a recent workshop on petascale computing for the geo-sciences, it is clear we haven't come close to the grid resolution that's needed. Because of the need to understand climate change, we need to invest in this as a societal issue, not just because it is an important scientific problem. It would have an incredible impact.
I'm also very concerned about energy production. I believe the ITER fusion project is coming just in time, and it will easily require petascale computing. There are also other challenging areas of energy research, such as converting solar to chemical energy. Nanoscience could also provide feasible paths to energy solutions, and computer simulation will help accelerate the path to innovation.
HPCwire: The Earth Simulator showed that the models can sometimes be broken at large scale. How serious is the problem with scaling today's algorithms?
Simon: This is quite a serious problem. Current systems provide an interconnect that gives the applications developer a notion of complete connectivity. In a fat tree topology, you don't really have to be concerned about latency too much. When you scale to 100,000 processors and beyond, however, the interconnects will have to be low-degree or grid interconnects. You can't afford fat trees at this scale. Latency will become a very big issue, and many algorithms haven't thought about this. Current applications software has been driven by MPI for 15 years. We need to the revisit whole issue of mapping applications to machines.