Half-Time in the Uber-Cloud

By Wolfgang Gentzsch and Burak Yenier

September 20, 2012

Since its first announcement on June 28 here on HPCwire, and its official start on July 20, the Uber-Cloud Experiment has attracted over 160 industry and research organizations and individuals from 22 countries. They all have one goal: to jointly explore the end-to-end process of remotely accessing technical computing resources sitting in HPC centers and in the cloud. The focus of this experiment is on engineering simulations performed by small and medium enterprises that expect a quantum leap in innovation and competitiveness by using high performance computing.

The benefits of remote access to HPC are widely recognized. We have at our disposal most of the technology needed to access and run our engineering workloads on remote resources. But we still face other challenges more related to the human element. For example, trusting in the resource provider; giving away some control over our applications, data, and resources; security; provider lock-in; software licensing; unfamiliar pay-per-use computing model; and a general lack of clarity in distinguishing between hype and reality.

To explore these hurdles in detail and to learn more about this end-to-end process, we were able to build 20 teams, each consisting of an end-user and their application, the software provider, the computational resource provider, and an HPC and/or CAE expert who manages the team process. Thanks to our participants, the following teams have been established:

Team

Project Description

Anchor Bolt

Simulating steel to concrete fastening capacity for an anchor bolt

Resonance

Electromagnetic simulations of NMR probe heads

Radiofrequency

Radiofrequency field distribution inside heterogeneous human body

Supersonic

Simulation of jet mixing in the supersonic flow with shock

Liquid-Gas

Two-phase flow simulation of separation columns

Wing-Flow

Flow around an aerospace wing

Ship-Hull

Simulation water flow around a hull of the ship

Cement-Flows

Burner simulation with different solid fuels in mining industry

Sprinkler

Simulating water flow through an irrigation water sprinkler

Space Capsule

Aerothermodynamics and stability analysis of a space capsule

Car Acoustics

Low frequency car acoustics

Dosimetry

Numerical EMC and dosimetry with high-res models

Weathermen

Large-scale and high-resolution weather and climate prediction

Wind Turbine

CFD simulations of vertical and horizontal wind turbines

Combustion

Simulating combustion in an IC engine

Blood Flow

Simulation of water/ blood flow inside rotating micro channels

ChinaCFD

CFD using homegrown C/C++ application

Gas Bubbles

Simulation of gas bubbles in a liquid mixing vessel

Side impact

Optimization of the side-door intrusion bars under a crash

ColombiaBio

Analysis of the biological diversity in a geography using R scripts

All 20 of these projects are underway today. Two of them are busy with defining their end-user project, 15 teams are in contact with the assigned computing resources and setting up the project environment, one is working on initiating and monitoring the end-user project execution, one is reviewing the results with the end user, and one team is already documenting the findings of the HPC as a Service process. To illustrate the team process in more detail, we present two of the projects and their current status in more detail.

Simulating new probe design for a medical device

Team Expert: Chris Dagdigian from BioTeam

The team’s end user is faced with a common problem: a periodic need for large compute capacity in order to simulate and refine potential product changes and improvements. The periodic nature of the HPC requirements means that it is not possible to have the desired amount of capacity internally as the company finds it difficult to justify capital expenditure for complex assets that may end up sitting idle for long periods of time.

To date the company has invested in a modest amount of internal HPC capacity sufficient to meet base requirements. Additional HPC resources would allow the end user to greatly expand the sensitivity of current simulations and may enable new product & design initiatives previously written off as “untestable.”

The HPC software being employed is CST Studio, a popular commercial application for electromagnetic simulations of many types. The application is currently operating in the Amazon cloud and the team has successfully completed a series of architecture refinements and scaling benchmarks. The hybrid cloud-bursting architecture allows local HPC resources residing at the end-user site to be utilized along with the Amazon cloud-based resources.

At this point in the project the team is still exploring the scaling limits of the Amazon GPU-equipped EC2 instance types and is beginning new tests and scaling runs designed to test HPC task distribution via MPI. The use of MPI will allow enable them to leverage different EC2 instance type configurations and scale beyond some technical limits imposed by the amount of memory residing within the NVIDIA GPU cards.

They believe they are currently at (or close to) the point in which they are routinely running simulations that would not be technically possible using the local-only resources of the end user. They also intend to begin testing the Amazon EC2 Spot Market, in which cloud-based assets can be obtained from an auction-like marketplace offering deeply significant cost savings over traditional on-demand hourly prices.

Multiphase flows within the cement and mineral industry

Team Expert: Ingo Seipp from science + computing ag

In this project ANSYS CFX is used to simulate a flash dryer in which hot gas is used to evaporate water from a solid. The team consists of FLSmidth as the end user, Bull as the resource provider with its extreme factory (XF) HPC on demand service, ANSYS as the software provider, and science + computing ag as team experts.

FLSmidth is the leading supplier of complete plants, equipment and services to the global minerals and cement industries. The end user needs about four to five days to complete a simulation run on the local IT infrastructure. He would like to reduce the total throughput time of the project and, in a second step, increase the mesh size to refine the results, without investing in hardware, which may not always be utilized full-time. For this, the simulation must be run on more cores and more memory through more nodes connected by a high-speed network.

XF provides 150 teraflops of computing power with InfiniBand, GPUs and currently, about 30 installed applications. Others are added on demand. Users can access XF through an easy-to-use web portal or direct login.

In this project, XF has enabled access to the end user and integrated ANSYS CFX in a web-interface for submitting jobs. For the course of this project licenses have been granted by ANSYS. The end user can manage his ANSYS licenses easily through the portal. The preparations to run the jobs are almost completed now and the first test runs should be able to start shortly.

Announcing Round Two of the Uber-Cloud Experiment

We consider Round One as proof of the concept that: yes, remote access to HPC resources works, and, there is a real need for it! And yes, there are hurdles on the way, but we know how to overcome them.

During the half-time webinar we asked the attendees if they would like to participate in the second round of the Uber-Cloud Experiment. 97 percent answered said they would. Therefore, we decided to start a new round of the experiment immediately after the first round completes. It will run from mid-November to mid-February.

Round Two of the experiment will be more professional. The end-to-end process of identifying, accessing and using remote resources (hardware, software, expertise) will become more structured, standardized, and tools-based. We will also handle more teams and more applications beyond CAE, and offer a list of additional professional services, for example, measuring the team effort. Finally, existing teams will be encouraged to use other resources, existing participants can work in new teams, and new participants can join and form new teams.

For anyone interested in learning more about the experiment or to register for Round Two, go to the Uber-Cloud Experiment website.

About the Authors

Wolfgang Gentzsch and Burak Yenier are the creators and facilitators of the Uber-Cloud Experiment. Wolfgang is an HPC veteran. Having worked in leading positions in research, academia and industry for some 30 years, Wolfgang is now an HPC consultant and the chairman of the ISC Cloud conference series for HPC and Big Data in the Cloud. Burak is the vice president of operations at CashEdge, a software-as-a-service company in Silicon Valley, which provides innovative payments and aggregation solutions to financial institutions.

Subscribe to HPCwire's Weekly Update!

Be the most informed person in the room! Stay ahead of the tech trends with industry updates delivered to you every week!

2024 Winter Classic: Meet Team Morehouse

April 17, 2024

Morehouse College? The university is well-known for their long list of illustrious graduates, the rigor of their academics, and the quality of the instruction. They were one of the first schools to sign up for the Winter Read more…

MLCommons Launches New AI Safety Benchmark Initiative

April 16, 2024

MLCommons, organizer of the popular MLPerf benchmarking exercises (training and inference), is starting a new effort to benchmark AI Safety, one of the most pressing needs and hurdles to widespread AI adoption. The sudde Read more…

Quantinuum Reports 99.9% 2-Qubit Gate Fidelity, Caps Eventful 2 Months

April 16, 2024

March and April have been good months for Quantinuum, which today released a blog announcing the ion trap quantum computer specialist has achieved a 99.9% (three nines) two-qubit gate fidelity on its H1 system. The lates Read more…

Mystery Solved: Intel’s Former HPC Chief Now Running Software Engineering Group 

April 15, 2024

Last year, Jeff McVeigh, Intel's readily available leader of the high-performance computing group, suddenly went silent, with no interviews granted or appearances at press conferences.  It led to questions -- what's Read more…

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

April 15, 2024

As the AI revolution marches on, it is vital to continually reassess how this technology is reshaping our world. To that end, researchers at Stanford’s Institute for Human-Centered AI (HAI) put out a yearly report to t Read more…

Crossing the Quantum Threshold: The Path to 10,000 Qubits

April 15, 2024

Editor’s Note: Why do qubit count and quality matter? What’s the difference between physical qubits and logical qubits? Quantum computer vendors toss these terms and numbers around as indicators of the strengths of t Read more…

MLCommons Launches New AI Safety Benchmark Initiative

April 16, 2024

MLCommons, organizer of the popular MLPerf benchmarking exercises (training and inference), is starting a new effort to benchmark AI Safety, one of the most pre Read more…

Exciting Updates From Stanford HAI’s Seventh Annual AI Index Report

April 15, 2024

As the AI revolution marches on, it is vital to continually reassess how this technology is reshaping our world. To that end, researchers at Stanford’s Instit Read more…

Intel’s Vision Advantage: Chips Are Available Off-the-Shelf

April 11, 2024

The chip market is facing a crisis: chip development is now concentrated in the hands of the few. A confluence of events this week reminded us how few chips Read more…

The VC View: Quantonation’s Deep Dive into Funding Quantum Start-ups

April 11, 2024

Yesterday Quantonation — which promotes itself as a one-of-a-kind venture capital (VC) company specializing in quantum science and deep physics  — announce Read more…

Nvidia’s GTC Is the New Intel IDF

April 9, 2024

After many years, Nvidia's GPU Technology Conference (GTC) was back in person and has become the conference for those who care about semiconductors and AI. I Read more…

Google Announces Homegrown ARM-based CPUs 

April 9, 2024

Google sprang a surprise at the ongoing Google Next Cloud conference by introducing its own ARM-based CPU called Axion, which will be offered to customers in it Read more…

Computational Chemistry Needs To Be Sustainable, Too

April 8, 2024

A diverse group of computational chemists is encouraging the research community to embrace a sustainable software ecosystem. That's the message behind a recent Read more…

Hyperion Research: Eleven HPC Predictions for 2024

April 4, 2024

HPCwire is happy to announce a new series with Hyperion Research  - a fact-based market research firm focusing on the HPC market. In addition to providing mark Read more…

Nvidia H100: Are 550,000 GPUs Enough for This Year?

August 17, 2023

The GPU Squeeze continues to place a premium on Nvidia H100 GPUs. In a recent Financial Times article, Nvidia reports that it expects to ship 550,000 of its lat Read more…

Synopsys Eats Ansys: Does HPC Get Indigestion?

February 8, 2024

Recently, it was announced that Synopsys is buying HPC tool developer Ansys. Started in Pittsburgh, Pa., in 1970 as Swanson Analysis Systems, Inc. (SASI) by John Swanson (and eventually renamed), Ansys serves the CAE (Computer Aided Engineering)/multiphysics engineering simulation market. Read more…

Intel’s Server and PC Chip Development Will Blur After 2025

January 15, 2024

Intel's dealing with much more than chip rivals breathing down its neck; it is simultaneously integrating a bevy of new technologies such as chiplets, artificia Read more…

Choosing the Right GPU for LLM Inference and Training

December 11, 2023

Accelerating the training and inference processes of deep learning models is crucial for unleashing their true potential and NVIDIA GPUs have emerged as a game- Read more…

Baidu Exits Quantum, Closely Following Alibaba’s Earlier Move

January 5, 2024

Reuters reported this week that Baidu, China’s giant e-commerce and services provider, is exiting the quantum computing development arena. Reuters reported � Read more…

Comparing NVIDIA A100 and NVIDIA L40S: Which GPU is Ideal for AI and Graphics-Intensive Workloads?

October 30, 2023

With long lead times for the NVIDIA H100 and A100 GPUs, many organizations are looking at the new NVIDIA L40S GPU, which it’s a new GPU optimized for AI and g Read more…

Shutterstock 1179408610

Google Addresses the Mysteries of Its Hypercomputer 

December 28, 2023

When Google launched its Hypercomputer earlier this month (December 2023), the first reaction was, "Say what?" It turns out that the Hypercomputer is Google's t Read more…

AMD MI3000A

How AMD May Get Across the CUDA Moat

October 5, 2023

When discussing GenAI, the term "GPU" almost always enters the conversation and the topic often moves toward performance and access. Interestingly, the word "GPU" is assumed to mean "Nvidia" products. (As an aside, the popular Nvidia hardware used in GenAI are not technically... Read more…

Leading Solution Providers

Contributors

Shutterstock 1606064203

Meta’s Zuckerberg Puts Its AI Future in the Hands of 600,000 GPUs

January 25, 2024

In under two minutes, Meta's CEO, Mark Zuckerberg, laid out the company's AI plans, which included a plan to build an artificial intelligence system with the eq Read more…

DoD Takes a Long View of Quantum Computing

December 19, 2023

Given the large sums tied to expensive weapon systems – think $100-million-plus per F-35 fighter – it’s easy to forget the U.S. Department of Defense is a Read more…

China Is All In on a RISC-V Future

January 8, 2024

The state of RISC-V in China was discussed in a recent report released by the Jamestown Foundation, a Washington, D.C.-based think tank. The report, entitled "E Read more…

Shutterstock 1285747942

AMD’s Horsepower-packed MI300X GPU Beats Nvidia’s Upcoming H200

December 7, 2023

AMD and Nvidia are locked in an AI performance battle – much like the gaming GPU performance clash the companies have waged for decades. AMD has claimed it Read more…

Nvidia’s New Blackwell GPU Can Train AI Models with Trillions of Parameters

March 18, 2024

Nvidia's latest and fastest GPU, codenamed Blackwell, is here and will underpin the company's AI plans this year. The chip offers performance improvements from Read more…

Eyes on the Quantum Prize – D-Wave Says its Time is Now

January 30, 2024

Early quantum computing pioneer D-Wave again asserted – that at least for D-Wave – the commercial quantum era has begun. Speaking at its first in-person Ana Read more…

GenAI Having Major Impact on Data Culture, Survey Says

February 21, 2024

While 2023 was the year of GenAI, the adoption rates for GenAI did not match expectations. Most organizations are continuing to invest in GenAI but are yet to Read more…

Intel’s Xeon General Manager Talks about Server Chips 

January 2, 2024

Intel is talking data-center growth and is done digging graves for its dead enterprise products, including GPUs, storage, and networking products, which fell to Read more…

  • arrow
  • Click Here for More Headlines
  • arrow
HPCwire