HPCwire
The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing / November 13, 2006
Features:
HPC Challenge Benchmark at SC06

The HPC Challenge (HPCC) benchmark suite has been funded by the DARPA High Productivity Computing Systems (HPCS) program to help define the performance boundaries of future Petascale computing systems. HPCC is a suite of tests that examine the performance of high-end architectures using kernels with memory access patterns more challenging than those of the High Performance LINPACK (HPL) benchmark used in the Top500 list. Thus, the suite is designed to augment the Top500 list, providing benchmarks that bound the performance of many real applications as a function of memory access characteristics e.g., spatial and temporal locality, and providing a framework for including additional tests.

In particular, the suite is composed of seven well known computational kernels that attempt to span high and low spatial and temporal locality space: HPL, DGEMM (matrix-matrix multiply), STREAM, PTRANS (parallel matrix transpose), RandomAccess, FFT, b_eff (effective bandwidth and latency). By design, the HPCC tests are scalable with the size of data sets being a function of the largest HPL matrix for the tested system. The rules for the benchmark specify that a baseline run is necessary for each computer system entered in the list and there may also be an optimized run.

The first reference implementation of the code was released to the public in 2003. The first optimized submission came in April 2004 from Cray using the X1 installation at Oak Ridge National Laboratory. Since then Cray has championed the list of optimized submissions until finally superseded last November by the IBM's BlueGene/L system. By the time the first HPCC birds-of-feather at the Supercomputing conference in 2004 in Pittsburgh, the public database of results already featured major supercomputer makers - a sign that vendors took notice of the benchmark. At the same time, somewhat behind the scenes, the code was also tried by government and private institutions for procurement and marketing purposes. The highlight of 2005 was the announcement of a contest: the HPCC Awards.

The two complementary categories of the competition emphasized performance and productivity - the very goals of the sponsoring HPCS program. The performance-emphasizing Class 1 award draw attention of the biggest players in the supercomputing industry which resulted in populating the HPCC database with most of the top-10 entries of Top500 (some of which even were exceeding at that time the performance reported on Top500 - a tribute to HPCC's continuous results' updating policy and flexible submission deadlines). The contestants competed to achieve highest raw performance in one of the four tests: HPL, STREAM-system, RandomAccess, and FFT. The Class 2 award focused on productivity (defined as 50/50 mixture of performance and elegance) and introduced subjectivity factor into the judging process but also to the submitters' criteria of what is appropriate for the contest. As a result, a wide range of solutions were submitted spanning various programming languages (interpreted and compiled) and paradigms (with explicit and implicit parallelism). They featured openly available as well as proprietary technologies some of which were arguably confined to niche markets and some that were and are widely used.

The financial incentives for entering the contest turned out to be hardly needed as the HPCC seemed to have had already enjoyed enough recognition among the high-end community to justify large hardware time and man power investment from the largest supercomputer installations in the world. Nevertheless, HPCwire kindly provided both the press coverage as well as cash incentives for the four winning contestants of Class 1 and the winner of Class 2. At the HPCC's second birds-of-feather session during the SC05 conference in Seattle, WA, the former class was dominated by IBM's BlueGene/L from Lawrence Livermore National Laboratory while the latter was split among MTA pragma-decorated C and UPC codes from Cray and IBM, respectively.

While preparing for the SC06 BoF for the HPCC Awards, most of the results are temporarily withdrawn from the public web page and will return to its normal state after the results are announced at the BOF. This usually creates a two-week window during which the potential award-winning submissions are made. And this year seems to be no different. But other then the interest spurred by the SC conference activities, the HPCC website draws relatively steady web traffic amount to well over 100 thousand hits per month (sometimes approaching 300 thousand) coming from over 2000 unique visitors every month. As of this writing the site features over 100 base submissions and 20 optimized submissions. Simple data analysis tools are available directly from the convenience of a web browser. The analysis tools can provide various sorting options, simple and complete result selections, and kiviat chart generation. But more extensive analysis is possible with the export feature that delivers all of the submitted entries in the form of XML file or Excel spreadsheet.

There are a number of activities related to HPCC at SC06.

o  Sunday, November 12
   - 1:30pm-5:00pm Tutorial: The HPC Challenge (HPCC) Benchmark Suite (Room 20)

o   Tuesday, November 14
    - 12:15pm--1:15pm HPC Challenge BOF: The 2006 HPC Challenge Awards (Ballroom B-C)

o   Wednesday, November 15
    - 1:30pm - 3:00pm Panel: High Productivity Computing and Usable Petascale Systems (Room 24-25)

    - 2:30pm - 3:00pm Best Paper Finalist: Rahul Garg, Yogish Sabharwal;
                      Software Routing and Aggregation of Messages to Optimize
                      the Performance of the HPCC RandomAccess Benchmark (Room 22-23)

No single test can accurately compare the performance of any of today's high-end systems let alone any of those envisioned by the HPCS program in the future. The HPCC suite stresses not only the processors, but also the memory system and the interconnect fabric. It can be a better indicator of how a supercomputing system will perform across a spectrum of real-world applications. The real utility of the HPCC benchmarks are that architectures can be described with a wider range of metrics than just flop/s from HPL. The HPCC tests provide users with additional information to justify policy and perhaps purchasing decisions.