Amazon Web Services is the cloud provider most often cited in scientific articles about high performance applications in the cloud. Meanwhile, cloud competitor Rackspace has not ventured much into the high scientific computing arena.
That is, until this week when Rackspace, noted provider of cloud services, will be looking at getting into the cloud based HPC game by partnering with CERN, a development that was announced on Monday.
“We have never really played in research, higher education, and other technical spaces,” says Rackspace CTO John Engates, “but we are getting traction now, thanks to OpenStack.” With the partnership with CERN and the ensuing work on OpenStack, Rackspace hopes to join those ranks. Indeed, they are certainly starting with one of the higher profile scientific computing use cases.
CERN’s partnership with Rackspace comes out of a division called openlab that CERN set up to experiment with IT storage and computational structures, some of which will take place in the public cloud offerings provided by Rackspace.
According to CERN, the Large Hadron Collider generates 25 petabytes of data per year on its own. Engates noted that CERN actually generates more data than that, but they end up discarding some of it simply as a result of lack of storage resources.
Specifically, the organization tasked with finding somewhere to put all of the data generated by the Large Hadron Collider needed a partner that would provide a space for all of the floating data that gets destroyed as a result of there simply not existing enough room.
A significant amount of resources from the partnership will be dedicated to incorporating the open source OpenStack into the scientific organization’s IT infrastructure.
Beyond expanding their storage and computational resources, the appeal here for CERN is the continuing promotion of open source technologies to carry out cloud-based projects. According to CERN’s IT infrastructure manager Tim Bell in the video below detailing CERN’s OpenStack efforts and what it means for CERN’s overall goals, “CERN’s always been a contributor to open source projects…Open source technology allows us to be able to contribute to improvements without having to do the work ourselves.”
Bell is planning on adding 15,000 hypervisors to support CERN’s experimental workloads. The goal here will be to improve the virtualization process such that the performance penalty for moving workloads, especially the high performance workloads with which CERN often handles, is not overly dramatic. Part of that, according to Engates, will involve OpenStack to enable the usage of APIs.
“They are also looking at OpenStack as a way to put APIs on the front of their supercomputers,” says Engates.
Aside from the ramifications of CERN moving certain storage and computational necessities to a cloud based system, somewhat of an inevitability if one has been following their desire to keep the balance of their data, Rackspace’s entry into the scientific computing field could mark a significant moment in porting high performance applications into a virtualized environment.
This is not CERN’s first dealing with Rackspace, as they used the ‘Swift’ object storage controller three years ago as part of the foundation of their OpenStack storage system. The work done on OpenStack to improve its provisioning by CERN and now Rackspace could promote the moving of more high performance applications to a cloud as the performance cost from a virtualized layer between the servers and the applications dwindles.
Until that happens, CERN does plan to add those aforementioned 15,000 hypervisors to virtualize current workloads. Another aspect to the partnership is the allocation of resources from both CERN’s private cloud and Rackspace’s public cloud and how to get the two to mesh such that workloads are optimized.
“We’ve collaborated with them to make their workloads not only go to the public cloud,” said Jim Curry, SVP, General Manager, Rackspace Private Cloud, “but also to do work within their private cloud, and most importantly developing a platform that allows them to choose to consume public cloud resources or private cloud resources.” That dichotomy between public and private cloud, as shown in the picture below, is important in keeping with CERN’s previously stated goal of allowing access to their datasets to researchers across the world.
HPC in the Cloud has examined recently the issues facing CERN and what they were hoping to accomplish as a global scientific initiative through making datasets accessible over Google servers. In that case, Google was making available 100,000 servers to the researchers for what was essentially overflow computational support.
This Rackspace-CERN partnership has ramifications on multiple levels of the HPC cloud spectrum, including furthering CERN’s goals of keeping all of their data and providing access to datasets for researchers across the globe, as well as potentially entering another player into the experimental application field for scientific computing in Rackspace.