|The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing / September 22, 2006|
At the HPC User Forum meeting in Denver this week, Horst Simon was one of two industry experts asked to provide a larger perspective as representatives of organizations in China, Japan and the U.S. discussed their petascale initiatives. In this Q&A, Simon offers his views about the challenges of achieving usable petascale systems within the next five years.
HPCwire: The U.S. and several other countries have petascale initiatives in play. How realistic is the dual goal of achieving sustained petaflops speed and substantially boosting productivity by 2010?
Simon: There have been multiple announcements of plans for petascale machines in that timeframe, so I'm fairly confident that the sustained petaflop goal will be attained by 2010, meaning that by then there will be a Gordon Bell Prize for a sustained petaflop on a real application on a real platform. I'd be even more comfortable predicting that this will happen by 2011.
As for part two of your question, it depends on your definition of productivity. If productivity means the economic output of a country, then I think 2010 is too early for petascale computing to affect that. If productivity means making efficient use of petascale systems in industry, for example to routinely create better products at lower costs through the use of simulation, this implies that the petascale systems would have good scalability, reliable application and system software, and so on, and I think 2010 will also be too soon for this. There will be many technical challenges, because we're entering a completely new arena of scalability. Getting to productive petaflop performance will be as difficult as it has been getting to productive teraflop performance.
But if productivity means running codes faster and at higher resolution, which can enables scientific breakthroughs, then I'm confident this will happen because of the continued dramatic advances in computational technology based on commodity clusters. Ten years ago, few people thought PC clusters would have the big impact they have had. The same thing will happen with petascale systems in the future. They will become common.
HPCwire: Can anyone really afford a general-purpose sustained petaflop system in the 2009-11 timeframe that several countries are targeting? By general purpose, I mean a system that can sustain petaflop performance on a reasonably broad spectrum of codes.
Simon: Yes, to the part about affording petaflop systems in that timeframe. Petaflop systems are expensive, but not super-expensive if you look at them in relation to other large-scale scientific projects, like particle accelerators or the next-generation space telescope. It costs about $200 million to fund a petascale system today, which is not outrageously high. The bigger question is, what is the optimal time to make that investment? The Earth Simulator was a huge investment of about $400 million and made a big, immediate impact when it went live in 2002. Now, four years later, a 40-teraflop machine is much less expensive. It's important to produce significant results in the first one to two years, so Moore's Law doesn't catch up with the machine. It must be productive quickly.
As for general-purpose, if we look at the ratios of memory and disk that would be needed, and the I/O rates, then a general-purpose petascale system could become much more expensive. But we're approaching an era when the whole notion of general-purpose HPC systems may no longer apply. Instead, there will be commodity clusters for most things, along with opportunities to leverage special-purpose technologies like Blue Gene, which can run specific applications very successfully, or MDGRAPE3 from RIKEN, which arguably is the first petascale system and is highly specialized. In 2010-11, I would expect an increasing trend toward more highly specialized systems.
HPCwire: Can anything be done to alleviate the costs of petascale systems while maintaining their usefulness?
Simon: One big issue for the future is that operational costs have been increasing significantly. We're approaching the point where computers can cost more to operate than their acquisition cost. One big potential area for cost reduction is less power-consuming components to reduce overall cost of ownership. Another one is facility construction. Construction costs have also gone up substantially, and you can save a lot of money if you don't have to build a new facility or heavily modify an existing one.
HPCwire: You mentioned power consumption, and you've stressed the importance of this before in designing petascale systems. Can processors have both leading performance and low power consumption? Are there ways to conserve power that don't involve the processors?
Simon: There are different approaches to this, and some vendors are exploring ways to sharply reduce power consumption, even for entry-level systems. But we as a community have not really looked at power consumption enough. Flops-per-watt is still a relatively new term. There's room for improvement in many areas of design. For example, everyone seems to agree that HPC is moving more toward liquid cooling again.
HPCwire: Partly to address the power consumption issue, there's a growing trend to scale up by using a larger number of slower processors. How does this affect the system's breadth-of-applicability?
Simon: This is exactly the challenge we face. We had a decade of stagnation in looking at parallelism, because high-end systems stayed at the same size -- not more than 10,000 processors. ASCI Red was the first system with about 10,000 processors, and this didn't change until the arrival of Blue Gene. Many applications scale to at best 100 or 1,000 processors, so there's a challenge for applications and system software to make a big leap to hundreds or thousands of processors.
The good news is that this scaling issue is solvable, because there are no physical limitations. The limitations here have to do with creativity, our ability to scale in our thinking. We need to solve this in the next few years, but the HPC community is capable of this. Scaling up also requires significant investments in system software. Just as applications have been stagnating, so has the scalability of system software. But if we think correctly, it doesn't matter whether we use 50 or 50,000 processors. We can conquer this realm of parallelism.
HPCwire: Some engineers have complained about the trend toward slower processors. They say that because their applications don't scale beyond a handful of processors, this trend is actually setting them back instead of moving them forward. Comment?
Simon: This is probably legitimate. There's a big sense of repeating the same cycle as we started in the late 1980s, when the first massively parallel systems arrived and many applications didn't lend themselves to these systems and had to continue running on vectors or SMPs. In my community, climate modelers were reluctant to go parallel. Now, 15 years later, nearly all are massively parallel. Climate modelers have found no inherent limitations to scalability. Another example is the SciDAC program. In that program, five years of investment in application development by the DOE made the DOE community much more ready for higher scaling.
HPCwire: What's your take on the importance of heterogeneous processing?
Simon: This comes back to what I said initially. Heterogeneous processing falls under the general heading of customized architectures, where you customize an architecture for the set of applications you want to run. Heterogeneous processing will be very important and useful, because it lets you build systems that are more optimal, less power-consuming and less expensive systems for solving a given set of problems at high efficiency. Roadrunner is a good example, and ClearSpeed is having success selling their accelerators. We will see more heterogeneous architectures in the future.
HPCwire: What are your summary thoughts about the current "petascale movement"?
Simon: I'll repeat what I've said elsewhere. I'm concerned that the term "petascale" covers such a wide territory. There are multiple stages. There will probably be a peak petaflop system in the next 18 months, then a Linpack petaflop, a sustained petaflop, and later on system that sustains petaflop performance on a wide variety of applications. As HPC insiders, we all understand this, but many politicians and government funders and the general public do not. If we as a community hang our hats too much on the first system with peak or Linpack petaflop performance, people outside the HPC community may conclude that we've conquered the petaflop challenge and it's time to move on to something else. The truth is, it will probably take six to eight more years to sustain a petaflop on a large number of applications, and in the meantime funding might fade away. It's good to have this petascale movement and initiatives in multiple countries to generate enthusiasm and to set a goal to move toward, but we need to keep moving after the first milestone.
Horst Simon is the founding director of Berkeley Lab's Computational Research Division, which conducts applied research and development in computer science, computational science, and applied mathematics. In 1988, he was awarded the Gordon Bell prize for his parallel processing research. He was also a co-developer of the NAS Parallel Benchmarks, a standard for evaluating the performance of massively parallel systems. Currently, Simon is the associate laboratory director for Computing Sciences at Berkeley Lab and the director of NERSC.