|The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing / November 17, 2006|
The SC06 conference has ended. As I write this, they're sweeping the dropped bits from the show floor and pushing the remaining supercomputing groupies out of the Tampa Convention Center.
It's hard to encapsulate a conference of this size into a few paragraphs. With over 9000 attendees, more than 250 exhibits and hundreds of presentations, I was only able to experience a fraction of the event. I'll try to encapsulate some of the highlights I encountered.
Stream Computing: More Than a Trickle
In my pursuit of this year's developing GPGPU (general purpose computing with GPUs) story, I talked with AMD and NVIDIA about their recent forays into stream computing. Both companies made announcements within the last week about GPU platforms that target high performance applications.
AMD introduced its Stream Processor and CTM ("Close To Metal") open hardware interface to offer GPGPU technology for non-graphics programmers. The processor itself is a souped-up graphics device, using an ATI GPU, with a full gigabyte of fast GDDR2 memory. GraphStream Inc., PANTA Systems, and Rackable Systems announced new stream servers that will use AMD's new device. The CTM provides the open interface so that commercial software developers can program the devices. Essentially it provides the ISA for the stream processor.
"That's typically something that graphics vendors absolutely refuse to do," said Dinesh Sharma, director of enterprise stream computing at AMD. "They're not going to give users low-level access to the device -- it's DirectX or OpenGL, or nothing else. So we had a spirited discussion inside the company and it came down on the right side. The opportunities were there to start to think about computations above the pixel. We think that CTM is the mechanism that people really do need to make this work."
AMD's hope is that this will enable the development of a third party ecosystem for stream processing software. Sharma said close to 100 people have already signed up for the CTM beta.
NVIDIA's also recently announced their first generation GPGPU technology called CUDA. It consists of a new GeForce 8800 graphics card and future Quadro Professional Graphics solutions. NVIDIA claims computing with CUDA overcomes some limitations of traditional GPU stream computing by enabling GPU processor cores to communicate, synchronize, and share data. A CUDA-enabled GPU operates as either a thread processor, where thousands of threads work together to solve complex problems, or as a streaming processor in specific applications such as imaging where threads do not communicate. CUDA-enabled applications use the GPU for fine grained data-intensive processing, and the multi-core CPUs for coarse grained tasks such as control and data management.
NVIDIA's software approach is different than AMD's. Instead of opening the hardware interface, NVIDIA will provide its own GPU-enabled C compiler that will do the heavy lifting required to turn C code into GPU instructions. Andy Keane, the general manager of the GPU computing group, said this approach isolates software developers from the underlying hardware. He went on to say the AMD's CTM approach is problematic because the underlying GPU architecture is rather volatile from generation to generation.
"GPUs evolve really quickly; there's a new instruction set every six months," said Keane. "Number one, the ISA changes, and number two, the optimization paths change on a per-chip basis. I can track that because as I'm designing the chip, I'm designing the software. But exposing that [hardware interface] to the world doesn't solve anybody's problem... CTM is a marketing effort."
So begins the AMD-NVIDIA rivalry (and I was just getting use to the AMD-Intel one).
Computing in 2020
On Thursday, I enjoyed the "HPC Computational Systems of 2020" Exotic Technologies session. Erik DeBenedict (Sandia National Laboratories'), Fernand Bedard (NSA) and Thomas Sterling (LSU) made their predications about the nature of computational technology fourteen years from now. DeBenedict proposed an 800 teraflop system containing 50,000 3-D CMOS chips containing 40 processing cores (4 CPUs and 36 accelerators). Bedard suggested a system based on Rapid Single Flux Quantum( RSFQ) superconductor devices, which allows for processor clock rates of up to 100 GHz. Sterling offered his Gilgamesh MIND processor-in-memory architecture. The session attendees voted Thomas Sterling's proposal as the most "exciting" of the three. He also had the best one-liner in the group. While displaying an ITRS (International Technology Roadmap for Semiconductors) graph in his slide deck, he quipped, "I just replace the copyright with my name -- that's how open source works, right?"
It's interesting to note that Sterling, alone, predicted exaflop performance in 2020, which, by the way, would follow historical trends. All three submissions were placed inside a "time capsule" that will be opened at SC20. I'm already making my hotel reservations.
AMD Quad-Core and Beyond?
Richard Oehler, Corporate Fellow at AMD, presented a session on Thursday to give everyone a taste of the company's multi-core roadmap. He talked about the new L3 cache and increased DRAM bandwidth on the upcoming (2007) quad-core chips. Oehler also discussed the effort going towards dynamically managing both power and performance, on a core-by-core basis, in order to increase energy efficiency. HyperTransport links will go from three to four, improving on-chip bandwidth.
But for the time being, the core-count roadmap seems to stop at eight. Oehler said AMD is following the current industry trend in scale out, rather that scale up, and they don't see a demand for many-core processors in the bulk of the market, that is, desktop and enterprise systems. This seems to support the notion that AMD is not going to do any special favors for the supercomputing crowd -- at least on the Opteron front. On the other hand, their aforementioned Stream Processor is certainly targeted for high performance applications, and their future Fusion (CPU-GPU) architecture is also geared towards HPC workloads.
The 800-Pound Multi-Core Gorilla
The last panel of SC06, "Multi-Core for HPC: Breakthrough or Breakdown?", was chock-full of industry luminaries including Thomas Sterling (LSU), Peter Kogge (University of Notre Dame), Ken Kennedy (Rice University), Steve Scott (Cray Inc.), Don Becker (Penguin Inc.) and William Gropp (Argonne National Laboratory). Each gave his perspective on the various issues of this, now mainstream, architecture. The issues discussed by the panel are too complex to summarize in a few words (although I intend to cover this in more detail in a future issue), but there was quite a bit of consensus on the main themes.
Most of the participants believed that the number of processor cores will continue to increase -- an inevitable result of the power dissipation limitations on semiconductor technology. The group also agreed that the current multi-core architectures put the CPUs on the wrong side of the memory wall. A hierarchical memory model, exploitation of locality, and other hardware/software technologies will be needed to solve the CPU-memory bandwidth disparity. And most of all, everyone acknowledged that the software models will have to evolve to take advantage of the increased parallelism. All of this promises to cause a great deal of pain for software developers, which is why Ken Kennedy summed up the multi-core problem as follows: "Be afraid. Be very afraid."
Back to the Future
Which brings me back to the beginning of SC06 -- Tuesday's keynote address by Ray Kurzweil. From his birds-eye view of history, Kurzweil presented a thoroughly compelling vision of how information technology will evolve over the next several decades. In fact, Kurzweil sees technological evolution as just a natural extension of biological evolution, where a more powerful paradigm is overtaking a less powerful one.
Kurzweil believes the scalability of information technology will enable it to conquer our biological limitations within the next few decades. By 2010 he believes, computers will become so integrated into our environment -- our clothing, our dwellings, our cars, etc. -- that the devices will seem to have disappeared. By 2013, a 10 petaflop machine will be able to functionally simulate the human brain, giving us access to the software of our minds. Incorporating this software model into even more powerful hardware, Kurzweil predicts we will create computers that will rival, and then surpass, human intelligence. We will integrate those machines with ourselves to extend our intellect beyond what biological evolution could have achieved. By 2045, Kurzweil extrapolates that our non-biological intelligence will be a billion times more powerful than all human intelligence.
Hmmm... I wonder what SC45 will be like.
As always, comments about HPCwire are welcomed and encouraged. Write to me, Michael Feldman, at email@example.com.