On Wednesday Microsoft announced something big in a little number. NCSA put one of its systems, the 9,000+ core Abe system, on the latest TOP500 list at number 23. The system performed at 68.5 TFLOPS on the HPL, the TOP500 metric of merit. According to Ryan Waite, group program manager for HPC at Microsoft, the software giant’s HPC team is especially proud to put a Windows HPC Server system that high on the list, and to see it run with nearly 77 percent efficiency. This is the highest ranking Windows HPC system to date.
Waite also pointed to the efficiency of another large Windows HPC Server system in double-digit seats on the TOP500 list. Umea University in northern Sweden placed their 5,376 core system at number 39 with 85.5 percent efficiency; according to Microsoft those figures make this the most efficient x86-based cluster in the TOP500. The Umea system is the second largest Windows HPC system ever built and marks the first time that Windows HPC Server has been announced on IBM hardware (Umea’s system is a Xeon-based cluster from IBM).
NCSA is running Beta 1 of Windows HPC Server 2008 on this system. Robert Pennington, deputy director of the NCSA, said in a prepared statement that “Our experience with Windows HPC Server 2008 has been impressive…. When we deployed Windows on our cluster, which has more than 1,000 nodes, we went from bare metal to running the LINPACK benchmark programs in just four hours. The performance of Windows HPC Server 2008 has yielded efficiencies that are among the highest we’ve seen for this class of machine.”
Microsoft also announced today that the Release Candidate for HPC Server will be available for download from their Web site the last week of June. Waite pointed to the extensive developments that have gone into HPC Server to ensure performance at scale. On the networking side, Microsoft has developed NetworkDirect RDMA, a new remote direct memory access interface protocol. Microsoft has partnered with vendors of a variety of networking hardware, including NetEffect, Mellanox and Myricom, in developing and proving NetworkDirect. NetEffect announced its support for NetworkDirect in its 10 GbE line last week, and Mellanox is demonstrating 2 microsecond latency and 2 GB per second throughput on its new ConnectX IB cards this week at ISC.
Microsoft has also invested a lot of effort engineering the shared memory interface in MPI to be more effective, especially for multicore processors. “Many of the customers we are talking to have existing MPI codes that they don’t want to re-tool,” said Waite. “For these customers it is critical that MPI be as effective as possible.” Interestingly, Microsoft is contributing its changes back to the MPICH community.
Other key technologies in the latest version of HPC Server include deployment and management tools, batch and service-oriented job management, and reliability features such as head node failover.
So, who is using HPC Server anyway? According to Waite, he is seeing more customers getting experience with the beta, including many users deploying on clusters with more than 200 nodes. Users include groups from national labs, research universities, engineering users running COTS packages such as FLUENT, and the oil and gas industry. One interesting customer is a large medical organization that is integrating its HPC operations into its overall IT offering. “This customer in particular is a good example of how HPC is becoming part of the general fabric of enterprise computing,” explains Waite.
NCSA’s Merle Giles echoes this point in a video released with the announcement, “The significance of the TOP500 run using a Windows operating system is that it opens the possibilities for other industries to utilize HPC that may not have been thinking of it…. As companies want to migrate from what may be on a desktop to what may be in the HPC environment, Windows becomes very important.”
The final production release of HPC Server 2008 is expected later this year.