Consolidating HPC's Gains

Consolidating HPC’s Gains

By Gary Johnson

August 13, 2013

Despite phenomenal progress in HPC over a sustained period of decades, a few issues limiting its effectiveness and acceptance remain. Prominent among these are the repeatability, transportability, and openness of HPC applications. As we prepare to move HPC to the exascale level, we should take the time and effort to consolidate HPC’s gains and deal with these residual issues from the early days of computational science. Only then will we be ready to reap the benefits of more powerful HPC tools.

HPC Tools

Nearly fifty years ago, in 1964, the first computer generally acknowledged as a supercomputer – the CDC 6600 – was introduced. At that time, there was no Linpack Benchmark or Top500 List but, by the measures in use then, it was able to sustain a performance level of about 500 Kiloflops.

In 1970, ARPAnet, the progenitor of the Internet came along. A few years later, in 1973, Ethernet was invented. In 1985, NSFnet was created and in the early 1990s it morphed into the Internet. In 1990 the World Wide Web was born and in 1993 it was made visual by the release of the Mosaic web browser. Also in 1993, the Top500 List was introduced and its top computer was a Thinking Machines CM-5, clocked at just under 60 Gigaflops.

In summary, HPC has existed for at least half a century and, in terms of HPC tools, we’ve had fairly capable supercomputers and networking for about 20 years.

HPC Applications

The concept of computational science came to public light no later than 1989, when our late friend and colleague, Ken Wilson, published his well-known Grand Challenges to Computational Science paper (unfortunately, it’s locked away behind a paywall). So, both the HPC tools and the computational science concept for HPC applications gelled into something pretty close to their contemporary form a couple of decades ago.

Originally, computational science was met with a fair amount of skepticism. It was seen by some as just a collection of stunts, producing little more than pretty pictures – not the real stuff of science. It was seen as lacking the rigor necessary to be on par with theory and experiment. Computational science results were often criticized as one-off demos of unproven concepts.

So, how effectively and convincingly have we been using HPC?

Repeatability, Transportability, Openness

Both theory and experiment share a few key attributes:

Repeatability (Recomputability)

A result obtained once can be repeated arbitrarily many times, given the same assumptions (for a theory) or conditions (for an experiment).

Transportability (Reuse)

Results are not dependent on any particular theorist, experimentalist or specific apparatus. They are transportable to other people and places – transcending any particular instance.

Openness

Results are open. Theorists publish their theories and the corresponding proofs (if possible) or conjectures. Experimentalists describe the conditions of their experiments and the details of their equipment and procedures. These steps are taken to ensure the credibility of results by enabling their repeatability and transportability.

HPC applications, as science, should also share these attributes – in order to rise above the early criticisms of computational science, and to be effective and convincing.

Current Status

Twenty years into the “modern era” of HPC applications, how are we doing? Clearly, we’ve made our applications bigger and more complex. Through improvements in the speed of both algorithms and hardware, our applications execute faster. The concepts of Verification and Validation (V&V) and Uncertainty Quantification (UQ) for scientific codes have taken root – but perhaps not yet fully blossomed in general HPC practice.

However, despite the laudable efforts of many of our HPC colleagues to solidify the standing of our field, significant issues with repeatability, transportability, and openness remain. Here are a few recent developments:

Repeatability (Recomputability)

Ian Gent, Professor of Computer Science at the University of St Andrews, has recently published something he calls The Recomputation Manifesto. It is described in a post of his at the Software Sustainability Institute. The Manifesto contains six points (emphasis mine):

Computational experiments should be recomputable for all time
Recomputation of recomputable experiments should be very easy
It should be easier to make experiments recomputable than not to
Tools and repositories can help recomputation become standard
The only way to ensure recomputability is to provide virtual machines
Runtime performance is a secondary issue

The Manifesto is based on Gent’s views that:

The current state of experimental reproducibility in computer science is lamentable. The result is inevitable: experimental results enter the literature which are just wrong. I don’t mean that the results don’t generalise. I mean that an algorithm which was claimed to do something just does not do that thing: for example, if the original implementation was bugged and was in fact a different algorithm. I suspect this problem is common, and I know for certain that it has happened. Here’s an example from my own research area, discovered by my friend and tenacious pursuer of replication Patrick Prosser.

The full text of the Manifesto is available on arXiv. Suffice it to say that Professor Gent’s concerns are well founded and extend beyond computer science to include HPC applications.

Transportability (Reuse)

A group of investigators from Korea and the US have recently published a paper entitled An Evaluation of the Software System Dependency of a Global Atmospheric Model. The abstract reads as follows (emphasis mine):

This study presents the dependency of the simulation results from a global atmospheric numerical model on machines with different hardware and software systems. The global model program (GMP) of the Global/Regional Integrated Model system (GRIMs) is tested on 10 different computer systems having different central processing unit (CPU) architectures or compilers. There exist differences in the results for different compilers, parallel libraries, and optimization levels, primarily due to the treatment of rounding errors by the different software systems. The system dependency, which is the standard deviation of the 500-hPa geopotential height averaged over the globe, increases with time. However, its fractional tendency, which is the change of the standard deviation relative to the value itself, remains nearly zero with time. In a seasonal prediction framework, the ensemble spread due to the differences in software system is comparable to the ensemble spread due to the differences in initial conditions that is used for the traditional ensemble forecasting.

The full paper is behind an American Meteorological Society paywall. Based on my interpretation of the abstract, transportability (or reuse) is a non-trivial issue for this HPC application. My guess is that this is not an isolated case.

Openness

A group of nine astrophysicists recently published a paper in arXiv entitled Practices in source code sharing in astrophysics. In it, they write (emphasis mine):

While software and algorithms have become increasingly important in astronomy, the majority of authors who publish computational astronomy research do not share the source code they develop, making it difficult to replicate and reuse the work. In this paper we discuss the importance of sharing scientific source code with the entire astrophysics community, and propose that journals require authors to make their code publicly available when a paper is published. That is, we suggest that a paper that involves a computer program not be accepted for publication unless the source code becomes publicly available. The adoption of such a policy by editors, editorial boards, and reviewers will improve the ability to replicate scientific results, and will also make the computational astronomy methods more available to other researchers who wish to apply them to their data.

So, openness clearly also remains an issue for HPC applications.

Note further that it’s not just the codes and their related parameters that should be publicly available – but also the scientific publications reporting on them. If you’ve been keeping track, you’ve noted that two papers mentioned in this article are behind paywalls – Ken Wilson’s seminal paper on Grand Challenges to Computational Science (24 years later!) and the recent one on the Global Atmospheric Model (despite its obvious public policy implications). The good news is that places like arXiv exist and the other publications mentioned here are out in the open.

Consolidating HPC’s Gains

HPC has come a long way. Our tools have improved greatly. For example, today’s fastest machine, China’s Tianhe-2, has been clocked at just under 34 Petaflops. So roughly speaking, HPC performance has improved by a factor of about 600,000 in the past 20 years (and 68 billion in the past 50 years). Current plans are to have exascale computers in place by the beginning of the next decade.

The rapid pace of improvement in HPC tools and their increasingly broader adoption by industry puts a lot of pressure on HPC applications – and on the financial resources available to support the whole HPC enterprise. Certainly, HPC applications have grown in scale and become more complex and inclusive of more physical phenomena. However, arguably, most petascale applications are still done in the old “hero mode” from the early days of computational science. Most practitioners compute at the terascale – not the petascale – and only limited resources have been made available to help them catch up before the bar is raised to exascale.

So, while we’re working toward exascale HPC tools, perhaps we should consolidate the HPC applications gains we’ve made thus far – so that we’ll be ready to embrace exascale and exploit it fully. Even if financial resources are scarce, this should be a high priority.

In addition to bringing more HPC applications – and people – up to the petascale level, we should address the lingering issues of repeatability, transportability, openness discussed above. If forced to pick one of these three to focus on, openness is probably the key.

If we publish openly and release the related source codes, repeatability and transportability should be solvable problems. The venues for open publication already exist and are being used by some communities. To complete this part of openness, just don’t allow your publications to be placed behind paywalls. There is no good reason that scientific work (probably funded by public money) should be behind paywalls. Once that bullet has been bitten, source codes must inevitably follow.

Topics: Applications, Systems

Sectors: Academia & Research

Tags: HPC, supercomputing

Anders Dam Jensen on HPC Sovereignty, Sustainability, and JU Progress

April 23, 2024

The recent 2024 EuroHPC Summit meeting took place in Antwerp, with attendance substantially up since 2023 to 750 participants. HPCwire asked Intersect360 Research senior analyst Steve Conway, who closely tracks HPC, AI, Read more…

AI Saves the Planet this Earth Day

April 22, 2024

Earth Day was originally conceived as a day of reflection. Our planet’s life-sustaining properties are unlike any other celestial body that we’ve observed, and this day of contemplation is meant to provide all of us Read more…

Intel Announces Hala Point – World’s Largest Neuromorphic System for Sustainable AI

April 22, 2024

As we find ourselves on the brink of a technological revolution, the need for efficient and sustainable computing solutions has never been more critical. A computer system that can mimic the way humans process and s Read more…

Empowering High-Performance Computing for Artificial Intelligence

April 19, 2024

Artificial intelligence (AI) presents some of the most challenging demands in information technology, especially concerning computing power and data movement. As a result of these challenges, high-performance computing Read more…

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that have occurred about once a decade. With this in mind, the ISC Read more…

2024 Winter Classic: Texas Two Step

April 18, 2024

Texas Tech University. Their middle name is ‘tech’, so it’s no surprise that they’ve been fielding not one, but two teams in the last three Winter Classic cluster competitions. Their teams, dubbed Matador and Red Read more…

Anders Dam Jensen on HPC Sovereignty, Sustainability, and JU Progress

April 23, 2024

The recent 2024 EuroHPC Summit meeting took place in Antwerp, with attendance substantially up since 2023 to 750 participants. HPCwire asked Intersect360 Resear Read more…

AI Saves the Planet this Earth Day

April 22, 2024

Earth Day was originally conceived as a day of reflection. Our planet’s life-sustaining properties are unlike any other celestial body that we’ve observed, Read more…

Kathy Yelick on Post-Exascale Challenges

April 18, 2024

With the exascale era underway, the HPC community is already turning its attention to zettascale computing, the next of the 1,000-fold performance leaps that ha Read more…

Software Specialist Horizon Quantum to Build First-of-a-Kind Hardware Testbed

April 18, 2024

Horizon Quantum Computing, a Singapore-based quantum software start-up, announced today it would build its own testbed of quantum computers, starting with use o Read more…

MLCommons Launches New AI Safety Benchmark Initiative

April 16, 2024

MLCommons, organizer of the popular MLPerf benchmarking exercises (training and inference), is starting a new effort to benchmark AI Safety, one of the most pre Read more…

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

April 15, 2024

As the AI revolution marches on, it is vital to continually reassess how this technology is reshaping our world. To that end, researchers at Stanford’s Instit Read more…

Intel’s Vision Advantage: Chips Are Available Off-the-Shelf

April 11, 2024

The chip market is facing a crisis: chip development is now concentrated in the hands of the few. A confluence of events this week reminded us how few chips Read more…

The VC View: Quantonation’s Deep Dive into Funding Quantum Start-ups

April 11, 2024

Yesterday Quantonation — which promotes itself as a one-of-a-kind venture capital (VC) company specializing in quantum science and deep physics — announce Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Google Addresses the Mysteries of Its Hypercomputer

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Eyes on the Quantum Prize – D-Wave Says its Time is Now

January 30, 2024

Early quantum computing pioneer D-Wave again asserted – that at least for D-Wave – the commercial quantum era has begun. Speaking at its first in-person Ana Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

The GenAI Datacenter Squeeze Is Here

February 1, 2024

The immediate effect of the GenAI GPU Squeeze was to reduce availability, either direct purchase or cloud access, increase cost, and push demand through the roof. A secondary issue has been developing over the last several years. Even though your organization secured several racks... Read more…

Intel’s Xeon General Manager Talks about Server Chips

January 2, 2024

Intel is talking data-center growth and is done digging graves for its dead enterprise products, including GPUs, storage, and networking products, which fell to Read more…

Click Here for More Headlines

HPCwire is a registered trademark of Tabor Communications, Inc. Use of this site is governed by our Terms of Use and Privacy Policy.

Reproduction in whole or in part in any form or medium without express written permission of Tabor Communications, Inc. is prohibited.

Leading Solution Providers

Off The Wire

Industry Headlines

April 23, 2024

April 22, 2024

April 19, 2024

April 18, 2024

Subscribe to HPCwire's Weekly Update!

Anders Dam Jensen on HPC Sovereignty, Sustainability, and JU Progress

AI Saves the Planet this Earth Day

Intel Announces Hala Point – World’s Largest Neuromorphic System for Sustainable AI

Empowering High-Performance Computing for Artificial Intelligence

Kathy Yelick on Post-Exascale Challenges

2024 Winter Classic: Texas Two Step

Anders Dam Jensen on HPC Sovereignty, Sustainability, and JU Progress

AI Saves the Planet this Earth Day

Kathy Yelick on Post-Exascale Challenges

Software Specialist Horizon Quantum to Build First-of-a-Kind Hardware Testbed

MLCommons Launches New AI Safety Benchmark Initiative

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

Intel’s Vision Advantage: Chips Are Available Off-the-Shelf

The VC View: Quantonation’s Deep Dive into Funding Quantum Start-ups

Nvidia H100: Are 550,000 GPUs Enough for This Year?

Synopsys Eats Ansys: Does HPC Get Indigestion?

Intel’s Server and PC Chip Development Will Blur After 2025

Choosing the Right GPU for LLM Inference and Training

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

Google Addresses the Mysteries of Its Hypercomputer

How AMD May Get Across the CUDA Moat

Leading Solution Providers

Contributors

Tiffany Trader

Editorial Director

Douglas Eadline

Managing Editor

John Russell

Senior Editor

Kevin Jackson

Contributing Editor

Ali Azhar

Contributing Editor

Alex Woodie

Contributing Editor

Addison Snell

Contributing Editor

Drew Jolly

Assistant Editor

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

China Is All In on a RISC-V Future

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

Eyes on the Quantum Prize – D-Wave Says its Time is Now

GenAI Having Major Impact on Data Culture, Survey Says

The GenAI Datacenter Squeeze Is Here

Intel’s Xeon General Manager Talks about Server Chips

The Information Nexus of Advanced Computing and Data systems for a High Performance World

Share

Copy short link