November 19, 2007
Greg Papadopoulos, Sun Microsystems CTO and executive vice president of research and development, directs a $2 billion budget to drive the company's technology research and direction. Here he talks about Sun's approach to massive-scale computing, his thoughts on standard instruction set architecutures, the status of the Proximity Communications research, and the significance of their new Constellation System supercomputer.
HPCwire: Lately, you've talked a lot about the expansion of massive-scale computing, what Sun refers to as "Redshift," and the inability of Moore's Law to keep up with this demand. This sounds a lot like the situation faced by high performance computing. Besides HPC, where is this market demand coming from?
PAPADOPOULOS: In addition to HPC, we're seeing strong demand from two other application areas, what I call "sum-of-bandwidth" and "*-prise" computing. Sum-of-bandwidth is essentially data serving, but at Internet scale, and it's enabled by increasingly bigger pipes that go out to consumers. It's a kind of Kirchoff's Law argument: discounting peer-to-peer for a moment, every bit requested by a consumer's laptop, TV or other network-enabled device is served by piece of infrastructure. And if you are a particularly hot site -- think of social networking, games or video -- the impressed growth rates can be stunning.
We also see rapid growth from companies offering enterprise services over the Internet. Here think of things like sales automation, ERP, email and even productivity applications. Certainly the average company's computing demands aren't growing close to Moore's Law (and so all of the interest in virtualization for consolidation!), but if you are one of the big *-Prise providers such as salesforce.com or SugarCRM, you are having to scale rapidly as your customer base grows.
HPCwire: What does Redshift say about the types of computing systems that need to be built in the future?
PAPADOPOULOS: It all boils down to efficiency at scale. And by this I mean real work performed versus capital expended, power consumed, space used, and the number of people needed to keep it all working. Historically, we all got target fixated on the capital costs (the "cheap revolution"), frequently at the expense of the other terms. But if you are, say, in Tokyo right now, it's quite likely that you will pay more over the lifetime of a system to power it than you did to purchase it in the first place. As applications get deployed at hyperscale, then these operating terms really begin to dominate.
Ideally, you'd love to be in the state where doubling scale should reduce the marginal cost of serving a new customer, or performing some unit of computation. That is as scale increases, so does efficiency. Mostly, the opposite is true unless you've been very thoughtful on how to design and architect the deployment of the underlying computing systems. This included paying close attention to things that many non-supercomputer designers have historically ignored, especially things like the co-design of cooling and power conditioning and distribution.
Solving these problems requires some serious engineering. One of the main reasons Sun has re-invested so heavily in HPC is because the needs HPC customers have today are the needs that other Redshift customers will have tomorrow. Take our recent introduction of Sun Datacenter Switch 3456, a 3,456-port IB switch that is amazingly efficient as you scale it up because it eliminates all of the inter-switch cabling, redundant packaging and latency of using 288 port switches. Today, it's a core part of Sun's new Constellation HPC cluster, but we see it being applicable to large corporate customers soon.
HPCwire: The massively scaled, software-as-a-service, computing model seems like a logical evolution of where we're going, but there still appears to be a lot of resistance. A lot of our computing infrastructure is built around inefficient, fat client PCs. Even HPC users expect to be able to buy deskside personal clusters over the next few years. How is the computing culture going to change?
PAPADOPOULOS: There's much more pragmatism today -- it's not an "either-or" conversation anymore. Really big HPC facilities will be strong attractors in terms of raw computing scale, efficiency, storage, and probably most importantly will be network connectivity. Deskside, or even laptop, systems will continue to serve as great places for development and exploration.
We're quite focused on how the two relate. Our ideal model is that the massive-scale networked infrastructure is a logical extension of -- maybe even a feature of -- a local system. It should just be there, and be very easy to "publish" computation to.
HPCwire: It seems like a Redshift computing model is dependent on big pipes to the end-user nodes. What about the challenges of the last mile bandwidth problem? This could be a bigger undertaking than just scaling up systems.
PAPADOPOULOS: Aside from the sum-of-bandwidth segment, it's really not all that dependent upon last-mile bandwidth. In fact, having better and better backbone or intra-cluster bandwidth can tip the equation even faster towards Redshift applications. Google's a simple example here. The network bandwidth to crawl and the internal bandwidths to index and rank are of course vastly bigger than a particular consumer's connection. And from that consumer's view, there is enormous compression taking place (Google indexes the web so that I don't have to expend all of that bandwidth too.).
This being said, the growth in last mile bandwidth is pretty encouraging to me. Just look at the state today versus the mid-nineties. Back then, 56kbps was considered very fast, and seldom achieved. This year in the U.S., broadband (1.5Mbps -- 4 Mbps) penetration has passed 50 percent. If you are in a more advanced place, such as Korea, Japan or Singapore, you are likely experiencing many times this rate as those countries have made the substantial investments in getting fiber optics to consumers.
I'll repeat here a statement I made in the late nineties: "Never bet against bandwidth!"
HPCwire: On the same topic, what's the reaction been like for Sun's Network.com utility computing offering? Does this service act as a bellwether on where the industry is in terms of acceptance of utility computing?
PAPADOPOULOS: We've seen a whole range of reactions, of course, and we are certainly attracting a profile of folks who want simple on-demand access to some pretty deep computing resources along with a set of surrounding services. We are mostly focused on technical computing, but you can see the same sort of uptake with services like the Amazon.com EC2 and S3 offerings.
It goes back to our earlier discussion around making it as easy as possible for users to provision services from a massively-scaled system. It's a timesharing service that can be public, or not, and facilitates resource allocation between users.
HPCwire: In the past, you've discussed the possibility of a converged, open instruction set architecture (ISA). Could you talk about why this would be a good thing and how you think the industry would go about creating such an ISA?
PAPADOPOULOS: The fact that a proprietary instruction set is the dominant binary, and the fact that we still care about binaries, is an abject failure from a computer science point of view. We should have long since not cared! Especially now that processing part of ISAs are now so unrelated to the performance of systems -- they're all basically equivalent -- and then we suffer through these messy and complicated binaries, of which x86 is particularly unwieldy. At the same time, it makes it really hard for us to collectively innovate in areas that are increasingly important such as new memory models, threading, network integration, virtualization, security and special function unit integration
Why isn't it something we can all agree upon, like TCP/IP? Obviously, there's no easy answer. What we have decided to do is to place our latest microprocessor *designs* out under open source license, which is available at OpenSPARC.net. (The SPARC ISA has been openly available for years.) Is it the perfect ISA? No, but a place where people can start innovating. We really should, collectively, as an industry, have a completely free and open market where you have lots and lots of people supplying. How do we get to that?
HPCwire: Sun's Proximity Communications technology looks like one of the more exciting areas of research that the company is working on. What's the current status of this work? Are you expecting this technology to show up in Sun products in the near future?
PAPADOPOULOS: Proximity Communications is one of those technologies that could truly earn the label "game changing". For the uninitiated, Proximity Communications allows for very high bandwidth, extremely low latency and energy efficient chip-to-chip communication via capacitive coupling of adjacent chips. Think of it as wafer-scale integration using individual dice. We are in advanced development of the technology with a very clear eye on integrating it into our products. We're focused now on proving out some next-level packaging concepts, so we don't have a public timeline by which we're expecting to ship products. That said, I'm continuing healthy investment into its development and am very pleased with the progress.
HPCwire: The company's recently announced Constellation System puts the company back into the high-end supercomputer business. What is Sun offering here that isn't being provided by a Cray XT or an IBM Blue Gene?
PAPADOPOULOS: Time and again we've seen that special purpose architectures, while great at claiming peak performance prizes, are often really hard to program and use. Our approach is fundamentally about general purpose computing: open software stacks on high-performance commodity processors, lots of memory bandwidth and capacity, and balanced low latency interconnect. We are intensely interested in ensuring that these systems are productive across a broad range of applications.
Our HPC customers are interested in leveraging economies of scale for obvious cost reasons, but also because it allows for a broader array of applications and, perhaps most interestingly, storage. As you may have read, Sun recently acquired much of the intellectual property behind the Lustre File System and, combined with our tape libraries, offers some interesting opportunities for design in the future.
It ties back to the larger point around doing serious engineering on general purpose computing platforms to get brutally efficient at various tasks and workloads. By leveraging Sun's new HPC investments and traditional enterprise expertise, we can target creating enterprise systems with the performance profile of an HPC system. And, of course, combine it with the reliability, availability and security of a traditional enterprise system.
Ultimately, we see the market for open, flexible and efficient systems growing rapidly. That's why we designed Project Blackbox. That's the design principle behind Constellation. That's why you'll see Sun continuing to drive innovation in the HPC market over the next few years.
This article originally appeared in HPCwire's special coverage of SC07, which can be found at www.hpcwire.com.
10/30/2013 | Cray, DDN, Mellanox, NetApp, ScaleMP, Supermicro, Xyratex | Creating data is easy… the challenge is getting it to the right place to make use of it. This paper discusses fresh solutions that can directly increase I/O efficiency, and the applications of these solutions to current, and new technology infrastructures.
10/01/2013 | IBM | A new trend is developing in the HPC space that is also affecting enterprise computing productivity with the arrival of “ultra-dense” hyper-scale servers.
Ken Claffey, SVP and General Manager at Xyratex, presents ClusterStor at the Vendor Showdown at ISC13 in Leipzig, Germany.
Join HPCwire Editor Nicole Hemsoth and Dr. David Bader from Georgia Tech as they take center stage on opening night at Atlanta's first Big Data Kick Off Week, filmed in front of a live audience. Nicole and David look at the evolution of HPC, today's big data challenges, discuss real world solutions, and reveal their predictions. Exactly what does the future holds for HPC?