HPCwire
The Leading Source for Global News and Information Covering the Ecosystem of High Productivity Computing / May 4, 2007
Hardware:
ClearSpeed Updates Advance Product Family

Updated Product Line Includes Performance and Functional Enhancements for CSXL Libraries, the Advance e620 PCIe Accelerator and the ClearSpeed Visual Profiler Toolset

BRISTOL, United Kingdom, May 1 -- ClearSpeed Technology, the world leader in acceleration technology for high performance computing (HPC), today announced new software and hardware enhancements to its Advance product family. The new offerings include performance and functionality enhancements to ClearSpeed CSXL software libraries, the Advance e620 PCI Express (PCIe) accelerator and the ClearSpeed Visual Profiler. Benchmarks using these enhanced CSXL libraries consolidate ClearSpeed's leadership in energy efficiency by delivering 20 times the performance per watt compared with industry standard servers when running the high performance LINPACK Benchmark(1).

The new 2.50 release of ClearSpeed's CSXL acceleration libraries introduces native support for Microsoft Windows and simplifies deployment with documentation updates and End User License Agreements. It provides a number of performance enhancements to the core linear algebra routines for matrix multiplication. Also included in the 2.50 release are the new ClearSpeed Vector Math Library and ClearSpeed Random Number Generators that support additional functionality such as Monte Carlo simulation for option pricing in the financial services industry. Performance comparisons based on benchmark code for European Option pricing provided by a major international bank showed up to 20 times performance speedup using a ClearSpeed Advance accelerator compared with an industry server(2). The use of multiple Advance accelerators in the system delivered up to 100 times performance speedup.

For scientific applications such as molecular modeling, recent results have demonstrated real-world application acceleration of between 3.4 to 9.4 times the speedup with AMBER modules and 4.5 times the speedup with the Bristol University Docking Engine (BUDE) program(3).

On April 27 Cambridge Healthtech Institute's Bio-IT World announced that ClearSpeed Technology was one of three Best of Show finalists for the Information Technology Infrastructure category. Executive Editor of Bio-IT World John Russell will present the awards at the ceremony at 6:15 p.m. ET on May 1 at the Bio-IT World Conference & Expo in Boston.

"Large consumers of compute power are looking for ways to improve both their system performance and performance per watt," said Steve Conway, research vice president of technical computing systems at IDC. "There is strong and increasing interest in acceleration technologies that could deliver improved performance without exceeding power, cooling and facilities constraints. ClearSpeed's acceleration technology is making advances in this area."

Building on the success of ClearSpeed's current PCI-X-based Advance X620 accelerator, the introduction of the complementary and smaller form factor PCIe-based Advance e620 accelerator brings all the benefits of ClearSpeed's acceleration technology to the latest generation of multi-core industry standard servers that incorporate the PCIe standard. Together the existing Advance X620 and the Advance e620 significantly increase the number of server platforms that can take advantage of ClearSpeed acceleration.

For developers, the new ClearSpeed Visual Profiler toolset provides that insight at every level of the system, including the interactions between multiple host processors and one or more ClearSpeed Advance accelerator boards. By delivering a consistent visual representation across the entire system, it provides the best possible environment in which to develop code that will perform optimally in today's multi-core and heterogeneous accelerated systems.

"The world's leading financial institutions and research organizations that depend upon the availability of compute power to maintain their competitive edge are struggling with the constraints of facilities space, power and cooling," said Stephen McKinnon, ClearSpeed's chief operating officer. "The enhancements to our product family are delivering three, five or even twenty times the application performance of unaccelerated systems, while adding less than five percent to the total energy bill. Acceleration technology is causing a radical rethink of datacenter design."

Performance Results

(1) LINPACK performance and performance per watt results

Comparative results

Accelerated cluster: 218.9 percent performance of standard system
Accelerated cluster: 53.6 percent less energy per job
Accelerated cluster: 5.3 percent more power (peak)
Accelerated cluster: 1.6 percent more power (average)

Standard node: 0.07 GFLOPS per watt
Accelerated node: 0.14 GFLOPS per watt, 2x energy efficiency of standard node
ClearSpeed X620: 1.37 GFLOPS per watt, 20x energy efficiency of standard node
ClearSpeed "Top Up": 4.95 GFLOPS per watt, 70x energy efficiency of standard node

ClearSpeed "Top Up" is defined as the additional performance delivered
for the additional average power consumption when compared with an
unaccelerated system.

Measured benchmark results

Standard Cluster: 114.8 GFLOPS, 40.8 minutes runtime
Power: 1900w peak, 1722w average, Energy: 0.29kWhr, 0.07 GFLOPS/w

ClearSpeed Accelerated Cluster: 251.3 GFLOPS, 18.7 minutes runtime
Power: 2000w peak, 1750w average, Energy: 0.14kWhr, 0.14 GFLOPS/w

Standard Node
Node: 28.7 GFLOPS, 431w, 0.07 GFLOPS/w - base energy efficiency

ClearSpeed Accelerated Node
Node: 62.8 GFLOPS, 438w, 0.14 GFLOPS/w - 2x base energy efficiency

ClearSpeed Advance X620 accelerator
X620: 34.1 GFLOPS, 25w, 1.37 GFLOPS/w - 20x base energy efficiency

ClearSpeed "Top Up" additional performance for additional power
X620: 34.1 GFLOPS, 6.9w, 4.95 GFLOPS/w - 70x base energy efficiency

The LINPACK Benchmark was introduced by Jack Dongarra. It is used to solve a dense system of linear equations. For the Top500, a version of the benchmark is used that allows the user to scale the size of the problem and to optimize the software in order to achieve the best performance for a given machine. This performance does not reflect the overall performance of a given system, as no single number ever can. It does, however, reflect the performance of a dedicated system for solving a dense system of linear equations. Since the problem is very regular, the performance achieved is quite high, and the performance numbers give a good correction of peak performance. A parallel implementation of the LINPACK Benchmark and instructions on how to run it can be found at http://www.netlib.org/benchmark/hpl/.

System specifications

Base system: HP DL380 G5, CPU: Intel Xeon 5160 (Woodcrest) x 2 @ 3GHz
Memory: 14GB, Operating System: RedHat EL4 64
ClearSpeed Acceleration: Advance X620, CSXL 2.24, BLAS: Intel MKL 8.1.1
LINPACK parameters: Host assist: 25 percent, HPL.dat: N: 75000, NB: 1152

Standard cluster: 4 nodes, 0 ClearSpeed Advance X620
ClearSpeed accelerated cluster 4 nodes, 4 ClearSpeed Advance X620 accelerator boards

(2) Monte Carlo Simulation

Statistical methods such as Monte Carlo simulation are used by financial institutions to derive future prices of complex option models that cannot be easily modeled by algorithmic approaches such as the Black-Scholes model. ClearSpeed chose to demonstrate Monte Carlo simulation for European options so that both the acceleration could be demonstrated as well as the accuracy of the result when compared with the Black-Scholes method. The benchmark code was supplied by a well known global banking organization.

Monte Carlo simulation for European option pricing.

1 CPU, no acceleration: 400M samples, 60 seconds, Speedup 1x
1 Advance board: 400M samples, 2.9 seconds, Speedup 20x
2 Advance boards: 400M samples, 1.5 seconds, Speedup 40x
4 Advance boards: 400M samples, 0.8 seconds, Speedup 79x

System specifications

Base System: Dell 2880, CPU: 2 x 3.0GHz Xeon , Memory: 3 GB
ClearSpeed Acceleration: 1 to 4 ClearSpeed Advance X620
Host Compiler: gcc, libraries: Randc, random number generator: CGaussian
ClearSpeed Advance X620 Libraries: CS VML & CS RNG

(3) AMBER and Bristol University Docking Engine (BUDE) Performance Results

AMBER

To demonstrate application level performance of accelerated systems we have modified a set of Amber 9 methods to take advantage of ClearSpeed's Advance accelerator board. This includes the effective radius and force calculation of AMBER's Generalized Born (GB) models, 1, 2, and 6. Supported options include constant pH7 and analytical linearized Poission Boltzmann (ALPB) as well as options that do not directly change the force calculation, including NMR restraints.

While the genborn module of Amber is a small part of the sander executable, it typically amounts for 95-97 percent of the CPU compute time for GB simulations. The CPU compute time is mainly spent in three loops: effective radii calculations, diagonal and off-diagonal force calculations.

The overall structure of the code was maintained. A thin layer written in C, using ClearSpeed's CSAPI library, was added to handle the communication between the host and board.

Generalized born 1 Minutes 83.5 (Host) 24.6 (Advance X620) 3.39 (Speedup)
Generalized born 2 Minutes 84.6 (Host) 23.5 (Advance X620) 3.60 (Speedup)
Generalized born 6 Minutes 37.9 (Host) 4.0 (Advance X620) 9.35 (Speedup)

Host: 2.8GHz Pentium 4 EMT64, OS: RHEL4-64, CSXL: version 2.50

Bristol University Docking Engine (BUDE)

1 host CPU, no acceleration: 48.2 seconds, Speedup 1.0x
1 Advance board: 10.6 seconds, Speedup 4.5x
2 Advance boards: 5.8 seconds, Speedup 8.3x
3 Advance boards: 4.4 seconds, Speedup 11.0x

Host: 2 x 2.8 GHz Xeon, OS RHEL4-64 ,CSXL version 2.24

About ClearSpeed

ClearSpeed Technology is a semiconductor company that develops massively parallel coprocessors, accelerator boards and software that deliver unmatched performance per watt for high performance computing applications in financial services, universities and national labs. ClearSpeed has offices in San Jose, California, and Bristol, UK and has 84 patents granted and pending. For more information, visit www.clearspeed.com.

-----

Source: ClearSpeed Technology