December 09, 2009
Caltech-led high-energy physicists show how long range networks can be used to support leading edge science
PASADENA, Calif., Dec. 5 -- Building on eight years of record-breaking developments, and on the restart of the Large Hadron Collider (LHC), an international team of high-energy physicists, computer scientists, and network engineers led by the California Institute of Technology (Caltech) joined forces to capture the Bandwidth Challenge award for massive data transfers during the SuperComputing 2009 (SC09) conference held in Portland, Ore.
Caltech's partners in the project include scientists from Michigan (UM), Fermilab, Brookhaven National Laboratory, CERN, San Diego (UCSD), Florida (UF and FIU), Brazil (Rio de Janeiro State University, UERJ, and the São Paulo State University, UNESP), Korea (Kyungpook National University, KISTI), Estonia (NICPB) and Pakistan (NUST).
Caltech's exhibit at SC09 by the High Energy Physics (HEP) group and the Center for Advanced Computing Research (CACR) demonstrated applications for globally distributed data analysis for the LHC at CERN. It also demonstrated Caltech's worldwide collaboration system, EVO (Enabling Virtual Organizations), developed with UPJS in Slovakia; its global-network and grid monitoring system MonALISA; and its Fast Data Transfer application, developed in collaboration with the Politechnica University (Bucharest). The CACR team also showed near-real-time simulations of earthquakes in the Southern California region, experiences in time-domain astronomy with Google Sky, and recent results in multiphysics multiscale modeling.
The focus of the exhibit was the HEP team's record-breaking demonstration of storage-to-storage data transfer over wide area networks from two racks of servers and a network switch-router on the exhibit floor. The high-energy physics team's demonstration, "Moving Towards Terabit/Sec Transfers of Scientific Datasets: The LHC Challenge," achieved a bidirectional peak throughput of 119 gigabits per second (Gbps) and a data flow of more than 110 Gbps that could be sustained indefinitely among clusters of servers on the show floor and at Caltech, Michigan, San Diego, Florida, Fermilab, Brookhaven, CERN, Brazil, Korea, and Estonia.
Following the Bandwidth Challenge, the team continued its tests and established a world-record data transfer between the Northern and Southern hemispheres, sustaining 8.26 Gbps in each direction on a 10 Gbps link connecting São Paulo and Miami.
By setting new records for sustained data transfer among storage systems over continental and transoceanic distances using simulated LHC datasets, the HEP team demonstrated its readiness to enter a new era in the use of state-of-the-art cyber infrastructure to enable physics discoveries at the high energy frontier, while demonstrating some of the groundbreaking tools and systems they have developed to enable a global collaboration of thousands of scientists located at 350 universities and laboratories in more than 100 countries to make the next round of physics discoveries.
"By sharing our methods and tools with scientists in many fields, we hope that the research community will be well-positioned to further enable their discoveries, taking full advantage of current networks, as well as next-generation networks with much greater capacity as soon as they become available," says Harvey Newman, Caltech professor of physics, head of the HEP team, colead of the U.S. LHCNet, and chair of the U.S. LHC Users Organization. "In particular, we hope that these developments will afford physicists and young students throughout the world the opportunity to participate directly in the LHC program, and potentially to make important discoveries."
One of the features of next-generation networks supporting the largest science programs-notably the LHC experiments-is the use of dynamic circuits with bandwidth guarantees crossing multiple network domains. The Caltech team at SC09 used Internet2's recently announced ION service-developed together with ESnet, GEANT and in collaboration with US LHCNet-to create a dynamic circuit between Portland and CERN as part of the bandwidth-challenge demonstrations.
One of the key elements in this demonstration was Fast Data Transfer (FDT), an open-source Java application developed by Caltech in close collaboration with Politechnica University in Bucharest. FDT runs on all major platforms and uses the NIO libraries to achieve stable disk reads and writes coordinated with smooth data flow using TCP across long-range networks. The FDT application streams a large set of files across an open TCP socket, so that a large data set composed of thousands of files-as is typical in high-energy physics applications-can be sent or received at full speed, without the network transfer restarting between files. FDT can work on its own, or together with Caltech's MonALISA system, to dynamically monitor the capability of the storage systems as well as the network path in real time, and send data out to the network at a moderated rate that achieves smooth data flow across long-range networks.
Since it was first deployed at SC06, FDT has been shown to reach sustained throughputs among storage systems at 100 percent of network capacity where needed in production use, including among systems on different continents. FDT also achieved a smooth bidirectional throughput of 191 Gbps (199.90 Gbps peak) using an optical system carrying an OTU-4 wavelength over 80 km provided by CIENA last year at SC08.
Another new aspect of the HEP demonstration was large-scale data transfers among multiple file systems widely used in production by the LHC community, with several hundred terabytes per site. This included two recently installed instances of the open-source file system Hadoop, where in excess of 9.9 Gbps was read from Caltech on one 10 Gbps link, and up to 14 Gbps was read on shared ESnet and NLR links, a level just compatible with the production traffic on the same links. The high throughput was achieved through the use of a new FDT/Hadoop adaptor-layer written by NUST in collaboration with Caltech.
The SC09 demonstration also achieved its goal of clearing the way to Terabit/sec (Tbps) data transfers. The 4-way Supermicro servers at the Caltech booth-each with four 10GE Myricom interfaces-provided 8.3Gbps of stable throughput each, reading or writing on 12 disks, using FDT. A system capable of one Tbps to or from storage could therefore be built today in just six racks at relatively low cost, while also providing 3,840 processing cores and 3 petabytes of disk space, which is comparable to the larger LHC centers in terms of computing and storage capacity.
An important ongoing theme of SC09 -- including at the Caltech booth, where the EVOGreen initiative was highlighted -- was the reduction of carbon footprint through the use of energy-efficient information technologies. A particular focus is the use of systems with a high ratio of computing and I/O performance to energy consumption. In the coming year, in preparation for SC10 in New Orleans, the HEP team will be looking into the design and construction of compact systems with lower power and cost that are capable of delivering data at several hundred Gbps, aiming to reach 1 Tbps by 2011 when multiple 100 Gbps links into SC11 may be available.
The two largest physics collaborations at the LHC-CMS and ATLAS, each encompassing more than 2,000 physicists, engineers, and technologists from 180 universities and laboratories-are about to embark on a new round of exploration at the frontier of high energies. When the LHC experiments begin to take collision data in a new energy range over the next few months, new ground will be broken in our understanding of the nature of matter and space-time, and in the search for new particles. In order to fully exploit the potential for scientific discoveries during the next year, more than 100 petabytes (10^17 bytes) of data will be processed, distributed, and analyzed using a global grid of 300 computing and storage facilities located at laboratories and universities around the world, rising to the exabyte range (10^18 bytes) during the following years.
The key to discovery is the analysis phase, where individual physicists and small groups located at sites around the world repeatedly access, and sometimes extract and transport, multi-terabyte data sets on demand from petabyte data stores in order to optimally select the rare "signals" of new physics from the potentially overwhelming "backgrounds" from already-understood particle interactions. The HEP team hopes that the demonstrations at SC09 will pave the way toward more effective distribution and use for discoveries of the masses of LHC data.
The demonstration and the developments leading up to the SC09 Bandwidth Challenge were made possible through the support of the partner network organizations mentioned, the National Science Foundation (NSF), the U.S. Department of Energy (DOE) Office of Science, and the funding agencies of the HEP team's international partners, as well as the U.S. LHC Research Program funded jointly by DOE and NSF.
Further information about the demonstration may be found at http://supercomputing.caltech.edu.
10/30/2013 | Cray, DDN, Mellanox, NetApp, ScaleMP, Supermicro, Xyratex | Creating data is easy… the challenge is getting it to the right place to make use of it. This paper discusses fresh solutions that can directly increase I/O efficiency, and the applications of these solutions to current, and new technology infrastructures.
10/01/2013 | IBM | A new trend is developing in the HPC space that is also affecting enterprise computing productivity with the arrival of “ultra-dense” hyper-scale servers.
Ken Claffey, SVP and General Manager at Xyratex, presents ClusterStor at the Vendor Showdown at ISC13 in Leipzig, Germany.
Join HPCwire Editor Nicole Hemsoth and Dr. David Bader from Georgia Tech as they take center stage on opening night at Atlanta's first Big Data Kick Off Week, filmed in front of a live audience. Nicole and David look at the evolution of HPC, today's big data challenges, discuss real world solutions, and reveal their predictions. Exactly what does the future holds for HPC?