November 12, 2012
SALT LAKE CITY, UT and BOULDER, CO., Nov. 12 - At the SC12 conference, Rogue Wave Software, the largest independent provider of cross-platform software development tools and embedded components for the next generation of HPC applications, announced that TotalView has achieved a significant debugging milestone during testing conducted as part of its strategic scalability initiative. During the testing, TotalView demonstrated its capability to debug a parallel job running on 786,432 processor cores. The tests were conducted on Lawrence Livermore National Laboratory's (LLNL) Sequoia, its IBM Blue Gene/Q supercomputer. These scalability tests are key to advancing Rogue Wave's strategic business goal of providing leading tools that scale with its customers' applications on today's petascale computers and to ensure that TotalView is well positioned for the industry's move towards exascale computing. Sequoia serves the National Nuclear Security Administration's Advanced Simulation and Computing (ASC) program, a cornerstone of the effort to ensure the safety, security, and reliability of the nation's nuclear deterrent without underground testing.
"We are actively working to increase the capabilities of our scientific codes to scale and take advantage of the phenomenal power of Sequoia. As part of this effort, we are looking for ways to get more on-node parallelism from existing codes and architecting our new codes to support the even more massive degrees of parallelism that we know will be needed in the future," stated Scott Futral, LLNL group leader for Development Environment. "Rogue Wave's dedication to pushing for ever-increasing scales with its TotalView debugger and the recent tests give us reason to be confident that TotalView will continue to be a critical development tool as we reach higher and higher scales with our own codes."
Rogue Wave's scalability initiative, which is a partnership with LLNL and LLNL's Tri-Lab partners (Los Alamos National Laboratory and Sandia National Laboratory), features a multi-architecture approach, targeting the Blue Gene/Q platform, along with x86-based architectures, like the Cray XE. Extreme-scale testing allows TotalView engineers to identify bottlenecks and prioritize efforts in optimizing and tuning the debugging engine for scalability. During the most recent testing session, TotalView successfully scaled across 786,432 cores, with no indication of the debugger hitting any barriers.
Rogue Wave conducted this test using a hybrid MPI + OpenMP code that implements a method for solving a system of linear equations. This application, which makes use of both MPI for distributed memory multi-process parallelism and OpenMP for shared memory thread based parallelism, was selected because it shares important characteristics with many applications used on extreme scale systems, such as Sequoia. This kind of attention to the workloads of large-scale systems is another key aspect of scalability requirements.
Since there was no indication of any barrier being hit at the 786,432 core mark, the testing suggests that TotalView could have leveraged more of Sequoia's 1.5 million cores if additional compute nodes had been available. In order to further push TotalView's scalability, additional tests oversubscribed the machine by spinning up more than one thread per core. Rogue Wave will announce the result of this second set of tests, which demonstrate successful debugging of an even higher number of threads, on Thursday November 15th at 12:00 PM MST. Rogue Wave invites SC12 attendees to visit its booth, #3418, to participate in a competition to correctly guess the number of threads TotalView debugged.
TotalView is a highly scalable debugger that provides troubleshooting for a wide variety of applications including: serial, parallel, multi-threaded, multiprocess, and remote applications.
Designed for developer productivity, TotalView simplifies and shortens the process of developing, debugging, and optimizing complex code. It provides a unique combination of capabilities for pinpointing and fixing hard-to-reproduce bugs, memory leaks, and performance issues. TotalView raises the bar for debugging by providing several additional features at no extra cost, including debugging for CUDA, OpenACC and deterministic reverse debugging, which allows users to pause, rewind and playback the sessions to accurately identify and correct errors.
About Rogue Wave Software
Rogue Wave Software, Inc. is the largest independent provider of cross-platform software development tools and embedded components for the next generation of HPC applications. Rogue Wave marries High Performance Computing with High Productivity Computing to enable developers to harness the power of parallel applications and multicore computing. Rogue Wave products reduce the complexity of prototyping, developing, debugging, and optimizing multi-processor and data-intensive applications. Rogue Wave customers are industry leaders in the Global 2000, ISVs, OEMs, government laboratories and research institutions that leverage computationally-complex and data-intensive applications to enable innovation and outperform competitors. Rogue Wave is a Battery Ventures portfolio company.
Source: Rogue Wave
10/30/2013 | Cray, DDN, Mellanox, NetApp, ScaleMP, Supermicro, Xyratex | Creating data is easy… the challenge is getting it to the right place to make use of it. This paper discusses fresh solutions that can directly increase I/O efficiency, and the applications of these solutions to current, and new technology infrastructures.
10/01/2013 | IBM | A new trend is developing in the HPC space that is also affecting enterprise computing productivity with the arrival of “ultra-dense” hyper-scale servers.
Ken Claffey, SVP and General Manager at Xyratex, presents ClusterStor at the Vendor Showdown at ISC13 in Leipzig, Germany.
Join HPCwire Editor Nicole Hemsoth and Dr. David Bader from Georgia Tech as they take center stage on opening night at Atlanta's first Big Data Kick Off Week, filmed in front of a live audience. Nicole and David look at the evolution of HPC, today's big data challenges, discuss real world solutions, and reveal their predictions. Exactly what does the future holds for HPC?